On Fri, Dec 6, 2013 at 7:22 PM, Pierre Chapuis
<catwell...@catwell.info> wrote:
Indeed! Until you bumped on all the hidden obstacles, the experience
is rather horrible. When Redis blows up on production — it usually
costs developers a few gray hairs :-)
If learning curve is flat, it usually means that the tool is too
casual to be useful.
On Fri, Dec 6, 2013 at 4:22 PM, Pierre Chapuis
<catwell...@catwell.info> wrote:
> Others:
>
> Quentin Adam, CEO of Clever Cloud (a PaaS) has a presentation that says
> Redis is not fit to store sessions:
> http://www.slideshare.net/quentinadam/dotscale2013-how-to-scale/15 (he
> advises Membase)
I don't quite understand the presentation to be super-honest, what
means "multiple writes" / "pseudo automic"? I'm not sure.
> Then there's the Disqus guys, who migrated to Cassandra,
I've no idea why Disqus migrated to Cassandra, probably it was just a
much better pick for them?
Migrating to a different does not necessarily implies a problem with
Redis, so this is not a criticism we can use in a positive way to act,
unless Disqus guys write us why they migrated and what Redis
deficiencies they found.
> This presentation about scaling Instagram with a small
> team (by Mike Krieger) is very interesting as well:
> http://qconsf.com/system/files/presentation-slides/How%20a%20Small%20Team%20Scales%20Instagram.pdf
> He says he would go with Redis again, but there are
> some points about scaling up Redis starting at slide 56.
This is interesting indeed, and sounds like problems that we can solve
with Redis Cluster. [...]
Let's face it, partitioning client side is complex. Redis Cluster
provides a lot of help for big players with many instances since
operations will be much simpler once you can reshard live.
We suspect that trading off implementation flexibility for
understandability makes sense for most system designs.
— Diego Ongaro and John Ousterhout (from Raft paper)
One of the big challenges we had with redis in mercadolibre was size of dataset. The fact that it needs to fit in memory was a big issue for us.
We used to have, on a common basis, 500gb DBs or even more.
Not sure if this is a common case for other redis users anyway.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.
It's not that we planned it. Developers started using it for something they thought will stay small but it grew. And it grew a lot. We ended up using redis to cache a small chunk of the data and the as a backend data store mysql or oracle.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.
Hello dear Redis community,
today Pierre Chapuis started a discussion on Twitter about Redis
bashing, stimulated by this thread on Twitter from Rick Branson:
https://twitter.com/rbranson/status/408853897495592960
It is not the first time that Rick Branson, that works at Instagram,
openly criticizes Redis, because I guess he does not like the Redis
design and / or implementation.
However according to Pierre, this is not something limited to Rick,
but there are other engineers in the SF area that believe that Redis
sucks, and Pierre also reported to hear similar stories in Paris.
Of course every open source project of a given size is target if
critiques, especially a project like Redis is very opinionated on how
programs should be written, with the search for simple design and
implementation that sometimes are felt as sub-optimal.
However, what we can learn from this critiques, and what is that you
think is not working well in Redis? I really encourage you to share
your view.
As a starting point I'll use Rick tweet: "BGSAVE. the sentinel wtf.
memory cliffs. impossible to track what's in it. heap fragmentation.
LRU impl sux. etc et".
He also writes: "you can't even really dump the whole keyspace because
KEYS "*" causes it to shit it's"
This is a good starting point, and I'll use the rest of this email to
see what happened in the different areas of Redis criticized by Rick.
1) BGSAVE
I'm not sure what is wrong with BGSAVE, probably Rick had bad
experiences with EC2 instances where the fork time can create latency
spikes?
2) The Sentinel WTF.
Here probably the reference is the following:
http://aphyr.com/posts/283-call-me-maybe-redis
Aphyr analyzed Redis Sentinel from the point of view of a consistent
system, consistent as in CAP "strong consistency". During partition in
Aphyr tests Sentinel was not able to handle the promises of a CP
system.
I replied with a blog post trying to clarify that Redis Sentinel is
not designed to provide strong consistency in the face of partitions,
but only to provide some degree of availability when the master
instance fails.
However the implementation of Sentinel, even as a system promoting a
slave when the master fails, was not optimal, so there was work to
reimplement it from scratch. Finally the new Sentinel is available in
Redis 2.8.x
and is much more simple to understand and predict. This is surely an
improvement. The new implementation is able to version changes in the
configuration that are eventually propagated to all the other
Sentinels, requires majority to perform the failover, and so forth.
However if you understand even the basics of distributed programming
you know a few things, like how a system with asynchronous replication
is not capable to guarantee consistency.
Even if Sentinel was not designed for this, is Redis improving from
this point of view? Probably yes. For example now the unstable branch
has support for a new command called WAIT that implements a form of
synchronous replication.
Using WAIT and the new sentinel, it is possible to have a setup that
is quite partition resistant. For example if you have three computers,
A, B, C, and run a Sentinel instance and a Redis instance in every
computer, only the majority partition will be able to perform the
failover, and the minority partition will stop accepting writes if you
use "WAIT 1", that is, if you wait the propagation of the write to at
least one replica. The new Sentinel also elects the slave that has the
most updated version of data automatically.
Redis Cluster is another step forward towards Redis HA and automatic
sharding, we'll see how it works in practice. However I believe that
Sentinel is improving and Redis is providing more tools to fine-tune
consistency guarantees.
3) Impossible to track what is in it.
Lack of SCAN was a problem indeed, now it is solved. Even before using
RANDOMKEY it was somewhat possible to inspect data sets, but SCAN is
surely a much better way to do this.
The same argument goes for KEYS *.
4) LRU implementation sucks.
The LRU implementation in Redis 2.4 had issues, and under mass-expire
there where latency spikes.
The LRU in 2.6 is much smoother, however it contained issues signaled
by Pavlo Baron where the algorithm was not able to guarantee expired
keys where always under a given threshold.
Newer versions of 2.6, and 2.8 of course, both fix this issue.
I'm not aware of issues with the LRU algorithm.
I've the feeling that Rick's opinion is a bit biased by the fact that
he was exposed to older versions of Redis, however his criticism where
in part actually applicable to older versions of Redis.
This show that there is something good about this critiques. For
instance Rick always said that replication sucked because of lack for
partial resynchronization. I'm sorry he is no longer able to say this.
As a consolatory prize we'll send him a t-shirt if budget will permit.
But this again shows that critiques tend to be focused where
deficiencies *are*, so hiding Redis behind a niddle is not a good idea
IMHO. We need to improve the system to make it better, as long is it
still an useful system for many users.
So, what are the critiques that you hear frequently about Redis? What
are your own critiques? When Redis sucks?
Let's tear Redis apart, something good will happen.
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
We suspect that trading off implementation flexibility for
understandability makes sense for most system designs.
— Diego Ongaro and John Ousterhout (from Raft paper)
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.
On Fri, Dec 6, 2013 at 11:46 PM, Alberto Gimeno Brieba
<gime...@gmail.com> wrote:
> I think that having something like NDS officially supported would make redis
> a great option for many more usage cases. Many times the 90% of the "hot
> data" in your db fits in an inexpensive server, but the rest of the data is
> too big and would be too expensive (unaffordable) to have enough RAM for it.
> So in the end you choose other db for the entire dataset.
I completely understand this, but IMHO to make Redis on disk right we need:
1) an optional threaded model. You may use it to dispatch maybe only
slow queries and on-disk queries. Threads are not a good fit for Redis
in memory I think. Similarly I believe that threads are the key for a
good on disk implementation.
2) Representing every data structure on disk in a native way. Mostly a
btree of btrees or alike, but definitely some work ahead to understand
what to use or what to implement.
On Fri, Dec 6, 2013 at 9:07 PM, Aphyr Null <aphyr...@gmail.com> wrote:
> While I am enthusiastic about the Redis project's improvements with respect
> to safety, this is not correct.
It is not correct if you take it as "strong consistency" because there
are definitely failure modes, basically it is not like if synchronous
replication + failover turned the system into Paxos or Raft. For
example if the master returns writable when the failover already
started we are no longer sure to pick the slave with the best
replication offset. However this is definitely "more consistent" then
in the past, and probably it is possible to achieve strong consistency
if you have a way to stop writes during the replication process.
--
On Fri, Dec 6, 2013 at 7:29 PM, Josiah Carlson <josiah....@gmail.com> wrote:Hello Josiah, thanks for your contrib. I agree with you, it is exactly
> Long story short: every one of the existing data structures in Redis can be
> improved substantially. All of them can have their memory use reduced, and
> most of them can have their performance improved. I would argue that the
> ziplist encoding should be removed in favor of structures that are concise
> enough to make the optimization unnecessary for structures with more than 5
> or 10 items. If the intset encoding is to be kept, I would also argue that
> it should be modified to apply to all sets of integers (not just small
> ones), and its performance characteristics updated if it happens that the
> implementation changes to improve large intset performance.
another case of "this is the simplest way to avoid work given that it
is good enough".
This would deserve a person allocated to this solely that is able to
do steady progresses and merge code when it is mature / tested enough
to avoid disasters, since it is a very sensible area.
Hello dear Redis community,
today Pierre Chapuis started a discussion on Twitter about Redis
bashing, stimulated by this thread on Twitter from Rick Branson:
https://twitter.com/rbranson/status/408853897495592960
It is not the first time that Rick Branson, that works at Instagram,
openly criticizes Redis, because I guess he does not like the Redis
design and / or implementation.
However according to Pierre, this is not something limited to Rick,
but there are other engineers in the SF area that believe that Redis
sucks, and Pierre also reported to hear similar stories in Paris.
Of course every open source project of a given size is target if
critiques, especially a project like Redis is very opinionated on how
programs should be written, with the search for simple design and
implementation that sometimes are felt as sub-optimal.
However, what we can learn from this critiques, and what is that you
think is not working well in Redis? I really encourage you to share
your view.
So, what are the critiques that you hear frequently about Redis? What
are your own critiques? When Redis sucks?
Let's tear Redis apart, something good will happen.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.
Others:
Quentin Adam, CEO of Clever Cloud (a PaaS) has a presentation that says Redis is not fit to store sessions: http://www.slideshare.net/quentinadam/dotscale2013-how-to-scale/15 (he advises Membase)
Tony Arcieri (Square, ex-LivingSocial) is a "frequent offender":
https://twitter.com/bascule/status/277163514412548096
https://twitter.com/bascule/status/335538863869136896
https://twitter.com/bascule/status/371108333979054081
https://twitter.com/bascule/status/390919938862379008
Then there's the Disqus guys, who migrated to Cassandra,
the Superfeedr guys who migrated to Riak...
Instagram moved to Cassandra as well, here's more on
it by Branson to see where he comes from:
http://www.planetcassandra.org/blog/post/cassandra-summit-2013-instagrams-shift-to-cassandra-from-redis-by-rick-branson
This presentation about scaling Instagram with a small
team (by Mike Krieger) is very interesting as well:
http://qconsf.com/system/files/presentation-slides/How%20a%20Small%20Team%20Scales%20Instagram.pdf
He says he would go with Redis again, but there are
some points about scaling up Redis starting at slide 56.
My personal experience, to be clear, is that Redis is an
awesome tool when you know how it works and how to
use it, especially for a small team (like Krieger basically).
I have worked for a company with a very reduced technical
team for the last 3.5 years. We make technology for mobile
applications which we sell to large companies (retail, TV,
cinema, press...) mostly white-labelled. I have written most
of our server side software, and I have also been responsible
for operations. We have used and still use Redis *a lot*, and
some of the things we have done would just not have been
possible with such a reduced team in so little time without it.
So when I read someone saying he would ban Redis from
his architecture if he ever makes a startup, I think: "good
thing he doesn't." :)
Thank you Antirez for this awesome tool.
On Sat, Dec 7, 2013 at 1:21 AM, Kelly Sommers <kell.s...@gmail.com> wrote:
> Descriptions like this indicate the trade-offs aren't understood, explicitly
> chosen and designed or accounted for. What is Redis trying to be? Is Redis
> trying to be a CP or AP system? Pick one and design it as such. From my
> perspective, with masters and slaves, Redis is trying to be a CP system but
> it's not achieving the goals. If it's trying to be an AP system, it isn't
> achieving those goals either.
I believe there is a place for "relaxed" CP systems. In Redis by
default the replication is asynchronous, and most people will use it
this way. At the same time because of the data model, and the size of
aggregate values single keys can hold, I don't want application
assisted merges, semantically. Relaxed CP systems can trade part of
consistency properties for performance and simple semantics, I'm not
sure why this is not acceptable.
WAIT can be also used, by improving the failover procedure, in order
to have a strong consistent system (no writes to the older master from
the point the failure detection is positive, to the end of the
failover when the configuration is updated, or alternative, disconnect
the majority of slaves you can reach during the failure detection so
that every write will fail during this time).
WAIT also improves the real-world "holes" that you face if the failure
detection is not designed to be safe.
For people it is important how systems behaves in practice. /dev/null
is not the same consistency of asynchronous replication for example.
Similarly users can say, I'm ok with a system that has excellent
latency and IOPs where everything is fine but is not able to feature
strong consistency, howevern when shit happens given that it can't
guarantee strong consistency, what degree of consistency will it
offer? What is the contract with the user?
I find the idea that there is "strong consistency" or nothing not
correct, the AP systems you cite are a perfect example of that.
Wallclock last-write-win is a model, but there are better more costly
models, and so forth.
From the point of view of weak CP systems you can see this in terms of
the kind of partition you have to create for inconsistencies to be
created.
There are systems where of all the partitions and failures possible
only a small subset will create inconsistencies, there are other
systems that are affected by a larger subset.
To reply you with a counter-example that is as pointless as your
pregnancy example: car safety is not just cars that I can't be killed
in an accident, or cars where I die.
Different cars will have different security levels, and if you want to
run faster, you are more exposed.
> Similar to ACID properties, if you partially provide properties it means the
> user has to _still_ consider in their application that the property doesn't
> exist, because sometimes it doesn't. In you're fsync example, if fsync is
Yes, but there are applications where data loss is totally acceptable
if it is an exception that happens with a given probability and with
given results in terms of amount of write lost.
There are instead applications where data loss is unacceptable, so one
loss, or 10 loss, is the same, and there you need a CP system.
> This conversation is about how to fix that, which I would love to see! It
> starts with a simple, but hard question to answer. What do you want a Redis
> cluster to be?
That's pretty simple: Redis cluster can't be CP because the
performance is unacceptable for the way most people use Redis. However
Redis could be optionally CP for some operation, and I believe that
WAIT is a start in that direction: not enough, more work is needed in
the leader switch to make the process safe. But that's optional, so
let's reason about: no synchronous replication by default.
Redis also can't accept a distribution model where there is the need
of merging values, since values are easily two billion zsets, or
alike. To timestamp with a logical clock each element is crazy for
instance, and the time in order to analyze and merge such big values
can be seriously big, the semantics not trivial to predict in real use
cases.
So no synchronous replication, no merge. What is the guarantees it
should be able to provide? The best consistency possible that is
possible to achieve in order to survive certain partitions.
With certain partitions I mean, the majority of masters with at least
a slave for every hash slot, should be able to continue operations.
My feeling is that under the above assumptions the best model is a CP
model with maximum windows to lose writes. The tradeoff of Redis
Cluster also changes the guarantees of clients in the majority
partition and clients in the minority have.
Kelly for an error sent me this via private email, but it was intended
to be public, so here is my reply:
[snip]
On Sun, Dec 8, 2013 at 9:04 AM, Kelly Sommers <kell.s...@gmail.com> wrote:
> Couple points about #2. Firstly, there are many ways to optimize disk usage
> commit them in a single fsync and acknowledging them all. There are alsoRedis already does this when fsync = always.
There are three processes A, B, C. Process A replicates to B, receives
the acknowledge, and replies ok to the client since the majority was
reached.
Process A fails, at the same time process B reboots. Process B returns
available again after the reboot, there is the majority: B and C that
can continue, however the write is lost.
As for Redis Cluster... Kelly is completely right in that the problem
is to define what Redis is. If I had to name *one* property of Redis
that makes people use it, it would be performance (low latency (*),
high throughput), for both reads and writes. This is basically the
reason why it is an in-memory system.
> There is no useful (*) distributed CA system. CA means partitions
> cannot happen, which means a single node system. But then can we
> really say it is highly available?
This is why I use "CA" to say, systems that provide no availability on
partitions but that are able to stop working instead of providing
wrong results when partitions happen.
It is just a way to name things, if "CA" is not the best, we can call
it in another way, but I find it helpful to say "CA" from the point of
view of finding a common name for those kind of systems.
> I don't think it will be possible to keep these properties with a
> CP system. Inter-node network latencies will be deadly. So I
> just don't think it makes sense to try to make Redis cluster CP.
I agree, as already stated the default operations can't be CP, however
I would be enthusiast to have an optional CP mode based on WAIT that
is able to make its work without affecting the other clients, and I
think this is possible to achieve in accordance with the other goals
exposed.
Why not... I think Brewer said explicitly in a paper (which I cannot find
right now) that CAP was to be understood for some data at some point
in time. So you can have a system that is CP for some data and CA for
other data, and you can have a system that switches between CA and
CP modes. But I think that this "CP" mode should be seen like a bonus,
and not hinder the "natural" distributed mode for Redis which is AP.
Le dimanche 8 décembre 2013 14:25:52 UTC+1, Salvatore Sanfilippo a écrit :> There is no useful (*) distributed CA system. CA means partitions
> cannot happen, which means a single node system. But then can we
> really say it is highly available?
This is why I use "CA" to say, systems that provide no availability on
partitions but that are able to stop working instead of providing
wrong results when partitions happen.
It is just a way to name things, if "CA" is not the best, we can call
it in another way, but I find it helpful to say "CA" from the point of
view of finding a common name for those kind of systems.
This looks like the definition of CP to me. It will prefer to stop
working instead of compromising consistency for the sake of
availability.
> As for Redis Cluster... Kelly is completely right in that the problem
> is to define what Redis is. If I had to name *one* property of Redis
> that makes people use it, it would be performance (low latency (*),
> high throughput), for both reads and writes. This is basically the
> reason why it is an in-memory system.
However! Switching between CP and AP for the same data means you are basically an AP system. From the perspective of the actors and observers of the system, they can't trust the system to ever be correct so they must consider that AP mode happens anyways. A CP system means that actors and observers have a set of guarantees. If that can be traded-off then the application must account for this trade-off. Even more problematic, if this toggle is done with a command like WAIT, a misbehaving application can cause incorrect state to well behaved applications. We must consider the serializability implications when CP can be circumvented.
I may be wrong though, because I don't understand the Cluster
replication algorithm. Maybe if Antirez could publish an explanation
of how it works and the assumptions it makes (comparable to the
Raft paper and associated lecture slides) it would answer a lot of
the questions people have. But I can understand this would be a
*lot* of work...
On Sun, Dec 8, 2013 at 7:46 PM, Pierre Chapuis
<catwell...@catwell.info> wrote:
> Antirez cites Raft as an example, but Raft is all about leader election.
> In Redis Cluster the guarantees that Raft offers are apparently not
> there, and the WAIT command cannot provide them anyway.
I cited Raft as an example of CP system with false negatives. It
ensures that a positive reply means the entry will be applied to the
state machine, but it does not offer guarantees of the opposite when a
negative reply is provided to the client. This totally makes sense
IMHO for a number of reasons in the case of Raft.
What about using an already working disk key-value store like leveldb, rocksdb (http://rocksdb.org), lmdb (like nds does https://github.com/mpalmer/redis/tree/nds-2.6/deps/liblmdb ), etc.?
Please note that WAIT provides, in the context of the CAP theorem, exactly zero of consistency, availability, and partition tolerance. Labeling it CA or "relaxed CP" is misleading at best and dangerous at worst.
> Because you're trying to pretend to be a CP system (but not one) with things like WAIT, you will
> have a horde of users not understanding what a failed WAIT that writes to 1 node but not 2 nodes
> means. The ones who do understand what this means (after some pain in production) will learn
> that this operation doesn't work as expected and will have to consider WAIT having AP like
> semantics. Similar to CL.ALL.
Precisely. WAIT is *not* a consensus algorithm and it *can not* provide serializable semantics without implementing some kind of coherent transactional rollback.
> "CA" mode is often a way to refer to systems that are not partition
> tolerant but consistent.
I have yet to encounter any system labeled "CA" which actually provided CA. This should not be surprising because CA has been shown to be impossible in real-world networks. Please read http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf.
> Raft faces the user with the same exact tradeoff, when Raft replies
> that it failed to replicate to the majority, it really means that the
> result is undetermined.
You have not implemented or described RAFT's semantics in Redis, and failing to understand how Redis WAIT differs from RAFT, VR, multipaxos, etc is a dangerous mistake. Consensus protocols are subtle and extremely difficult to design correctly. Please consider writing a formal model and showing verification by a model checker, if not a proof.
> I think that Raft semantics is good enough for most use cases
Please don't claim these are equivalent designs. In particular, the RAFT inductive consistency constraint is not present in the current or proposed WAIT/failover design. Without a similar constraint you will not be able to provide linearizability.
> I'll surely do my research, but I'm not a Right Thing person. What I
> mean is that I'll try to provide what I can provide with my best of my
> capabilities now, making clear what are the tradeoffs.
Please consider choosing a proven consistency model and implementing it, instead of rolling your own. Alternatively, consider documenting that Redis can easily lose your data. I see an awful lot of people treating it as a system of record rather than a cache.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.
On Mon, Dec 09, 2013 at 11:23:41AM +0000, javier ramirez wrote:
> On 07/12/13 01:00, Alberto Gimeno wrote:
> >What about using an already working disk key-value store like
> >leveldb, rocksdb (http://rocksdb.org), lmdb (like nds does
> >https://github.com/mpalmer/redis/tree/nds-2.6/deps/liblmdb ),
> >etc.?
>
> FWIW, I attended a talk by basho the past week and they were talking
> about the upcoming features of riak. One of the new features are
> data types in a similar way to redis (lists, hashes, sets...) but
> running on riak, so with replication and persistence baked in. This
> piqued my curiosity, so I went to talk to the basho people after the
> talk, to see what can be done and how it was implemented.
Good to see others are seeing the value in a data structures server. I can
definitely see the value in being able to operate on more complicated data
structures inside the Riak paradigm, although it's going to start getting
awfully tricky if you still want to use a conflict resolution algorithm more
complicated than LWW.
I came all the way back to the top post, because I kinda feel that the thread has gone astray on the CAP things. Which isn't to diminish those things in any way: just - it is now going around in circles.
The reality is that while CAP is interesting, that isn't the only feature a product needs, and comments along the lines of "what does redis want to be when it grows up?" are pretty condescending IMO.
If I had to give a list of things that cause me pain in redis, I would say:
- "keys" et al: which is now a solved problem with "scan"
- how to perform maintenance on a 24x7 master node without having a brief "blip" - this is something I very much hope redis-cluster makes much friendlier
- scalability of large domains - again: redis-cluster
- replication quirks on unreliable connections: I'm hopeful "psync" makes this happier
So actually, most of the things that *are actual real problems for me* : already in hand.
The transaction model takes a little getting used to - but when you get into the "assert, try, redo from start if fail" mindset it is a breeze (and client libraries can do things to help here) - so I don't count this as a plus or a minus - just a "difference". Of course, when this isn't practical: LUA allows the problem to be approached procedurally instead.
For the good: redis is a kickass product with insanely fast performance and rock solid reliability even when under sustained and aggressive load. The features are versatile allowing complex models to be built from easy to understand primitives.
We love you :p
Marc
Hello dear Redis community,
today Pierre Chapuis started a discussion on Twitter about Redis
bashing, stimulated by this thread on Twitter from Rick Branson:
https://twitter.com/rbranson/status/408853897495592960
It is not the first time that Rick Branson, that works at Instagram,
openly criticizes Redis, because I guess he does not like the Redis
design and / or implementation.
However according to Pierre, this is not something limited to Rick,
but there are other engineers in the SF area that believe that Redis
sucks, and Pierre also reported to hear similar stories in Paris.
Of course every open source project of a given size is target if
critiques, especially a project like Redis is very opinionated on how
programs should be written, with the search for simple design and
implementation that sometimes are felt as sub-optimal.
However, what we can learn from this critiques, and what is that you
think is not working well in Redis? I really encourage you to share
your view.
As a starting point I'll use Rick tweet: "BGSAVE. the sentinel wtf.
memory cliffs. impossible to track what's in it. heap fragmentation.
LRU impl sux. etc et".
He also writes: "you can't even really dump the whole keyspace because
KEYS "*" causes it to shit it's"
This is a good starting point, and I'll use the rest of this email to
see what happened in the different areas of Redis criticized by Rick.
1) BGSAVE
I'm not sure what is wrong with BGSAVE, probably Rick had bad
experiences with EC2 instances where the fork time can create latency
spikes?
2) The Sentinel WTF.
Here probably the reference is the following:
http://aphyr.com/posts/283-call-me-maybe-redis
Aphyr analyzed Redis Sentinel from the point of view of a consistent
system, consistent as in CAP "strong consistency". During partition in
Aphyr tests Sentinel was not able to handle the promises of a CP
system.
I replied with a blog post trying to clarify that Redis Sentinel is
not designed to provide strong consistency in the face of partitions,
but only to provide some degree of availability when the master
instance fails.
However the implementation of Sentinel, even as a system promoting a
slave when the master fails, was not optimal, so there was work to
reimplement it from scratch. Finally the new Sentinel is available in
Redis 2.8.x
and is much more simple to understand and predict. This is surely an
improvement. The new implementation is able to version changes in the
configuration that are eventually propagated to all the other
Sentinels, requires majority to perform the failover, and so forth.
However if you understand even the basics of distributed programming
you know a few things, like how a system with asynchronous replication
is not capable to guarantee consistency.
Even if Sentinel was not designed for this, is Redis improving from
this point of view? Probably yes. For example now the unstable branch
has support for a new command called WAIT that implements a form of
synchronous replication.
Using WAIT and the new sentinel, it is possible to have a setup that
is quite partition resistant. For example if you have three computers,
A, B, C, and run a Sentinel instance and a Redis instance in every
computer, only the majority partition will be able to perform the
failover, and the minority partition will stop accepting writes if you
use "WAIT 1", that is, if you wait the propagation of the write to at
least one replica. The new Sentinel also elects the slave that has the
most updated version of data automatically.
Redis Cluster is another step forward towards Redis HA and automatic
sharding, we'll see how it works in practice. However I believe that
Sentinel is improving and Redis is providing more tools to fine-tune
consistency guarantees.
3) Impossible to track what is in it.
Lack of SCAN was a problem indeed, now it is solved. Even before using
RANDOMKEY it was somewhat possible to inspect data sets, but SCAN is
surely a much better way to do this.
The same argument goes for KEYS *.
4) LRU implementation sucks.
The LRU implementation in Redis 2.4 had issues, and under mass-expire
there where latency spikes.
The LRU in 2.6 is much smoother, however it contained issues signaled
by Pavlo Baron where the algorithm was not able to guarantee expired
keys where always under a given threshold.
Newer versions of 2.6, and 2.8 of course, both fix this issue.
I'm not aware of issues with the LRU algorithm.
I've the feeling that Rick's opinion is a bit biased by the fact that
he was exposed to older versions of Redis, however his criticism where
in part actually applicable to older versions of Redis.
This show that there is something good about this critiques. For
instance Rick always said that replication sucked because of lack for
partial resynchronization. I'm sorry he is no longer able to say this.
As a consolatory prize we'll send him a t-shirt if budget will permit.
But this again shows that critiques tend to be focused where
deficiencies *are*, so hiding Redis behind a niddle is not a good idea
IMHO. We need to improve the system to make it better, as long is it
still an useful system for many users.
So, what are the critiques that you hear frequently about Redis? What
are your own critiques? When Redis sucks?
Let's tear Redis apart, something good will happen.
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
We suspect that trading off implementation flexibility for
understandability makes sense for most system designs.
— Diego Ongaro and John Ousterhout (from Raft paper)
I came all the way back to the top post, because I kinda feel that the thread has gone astray on the CAP things. Which isn't to diminish those things in any way: just - it is now going around in circles.
The reality is that while CAP is interesting, that isn't the only feature a product needs, and comments along the lines of "what does redis want to be when it grows up?" are pretty condescending IMO.
While you're on the subject of replication, I suggest you read RFC 4533 (LDAP Content Sync Replication) to get some ideas. Currently your replication protocol's resync after a node disconnect/reconnect is far too expensive.