Redis 2.8.9 is out.

Salvatore Sanfilippo

unread,

Apr 22, 2014, 5:37:36 AM4/22/14

to Redis DB

Hello,

Redis 2.8.9 is out and is the strangest beast in our releases
history... a patch release that only adds significant new features and
not a single bug fix.

The lack of bug fixes is actually good news, no significant issue was
found in many weeks, which means that the 2.8 branch si starting to be
pretty stable.
About the new features introduced into 2.8.9, they are:

1) A new data structure, the HyperLogLog, partially documented here:
http://antirez.com/news/75 (read more about it in the next sections of
this email).
2) Three new commands for the sorted set data structure in order to
support lexicographical range queries in Redis: ZRANGEBYLEX,
ZLEXCOUNT, ZREMRANGEBYLEX.

The rest of this email is split in sections in order to make an easier read.

Why a 2.8 patch release with new features?
===

After months of working to Redis Cluster and Sentinel, people using
Redis as a pure in-memory data structure server, with big interests in
the capabilities of Redis from that point of view, and less interests
in the storage, HA, sharding aspects of the system, were starting to
feel a bit alone. The Redis project really was catalyzed by cluster,
cluster, cluster, for a long time. The idea as announced here before,
was to use the 3.2 release as a "back to data structures" theme for
new Redis developments, but it was *a lot* of time to wait. So I
decided to act ASAP and provide two self-contained but important new
tools to Redis users. Both the new things introduced are unlikely to
have any stability impact on Redis if you don't use them, so it was
possible to have them ASAP in a stable release.

Also before putting those changes in a stable release, they were
tested by me and other users for a significant amount of time. They
are working very reliably.

Now it is time to focus again on Redis Cluster and Sentinel for the
next months, but I hope Redis 2.8.9 is sending a signal to the Redis
community that we are not going to turn into a different thing. Redis
Cluster will be a feature like Redis persistence or replication are,
but the Redis project main goal is to provide a core of easy to
exploit in-memory data structures.

HyperLogLog
===

The blog post linked above does a decent job at describing what the
new data structure is hopefully, however the recent developments that
went into 2.8.9 are not documented there, that is, the support merged
into 2.8.9 supports the sparse representation of HyperLogLogs. This
means that HLLs with low cardinalities don't need to use 12k of
memory, but a lot less.

Let's be pragmatic.. HLLs observing N elements use M bytes of memory
as in the following table:

N=10, M=47
N=100, M=290
N=500, M=1046
N=1000, M=1894
N=4000, M=12304

As you can see at N=4000, the HyperLogLog is switched to the 12k-using
dense representation. But for lower cardinalities, the amount of
memory saved is big, if you have many HLLs counting different things
with a long-tail distribution, this really makes a difference between
the ability to apply Redis or not.

Most of the complexity in the hyperloglog.c file is due to sparse
representation, but I believe it was worth it. The encoding I used is
"proprietary", in the sense that it was invented specifically for
Redis requirements of speed and in-place update, plus the attempt to
be usage-pattern agnostic, which is, the representation is optimized
to work when PFCOUNT is called as often as PFADD. . The moment it
switches from the sparse representation to the dense one can be
configured. By default when the sparse representation uses 3000 byes
(1/4 of the dense representation memory usage) we switch, as this is
the moment where we are again on-pair with the order of magnitude of
memory used per elements observed, and, the moment where the CPU usage
to support the sparse representation is significant.

The commands PFADD, PFCOUNT, and PFMERGE are documented at redis.io.
The dense and sparse representations used are documented in the
hyperloglog.c file comments.

Sorted sets lexicographic range queries
===

Sorted sets were since the start about range queries, because being an
ordered collection of elements, ranges are what matter most: get the
top N elements, a by-score interval, are the common operations.
However even if sorted sets elements were already ordered
lexicographically when inserted with the same score, there was no easy
way to query by lexicographical ranges. It was possible to more or
less do it by exploiting the score plus a few Lua scripts, but this is
not a very high performance business, nor it is as simple as it should
be.

Lexicographical range queries turn Redis into many things. For example
into an high performance autocompletion server: I wrote a demo about
this use case: http://autocomplete.redis.io/

You can do a lot more stuff with it as explained already in this
mailing list, like building secondary indexes, graphs, and so forth.

The new use cases are supported by three new commands: ZRANGEBYLEX,
ZLEXCOUNT, ZREMRANGEBYLEX. The commands are documented at redis.io.

No internal change was performed to the sorted set implementation, so
the memory requirements, RDB file format, and everything else is
untouched.

It is interesting how using different sorted sets features you can
mount an incremental autocompletion engine with a few lines of code.
For example:

1) Every time you see a query, you use ZINCRBY in a sorted set to
increment its popularity.
2) Every time a given entry reaches a given value, you insert it in
the sorted set you use to auto-complete.

This is the vanilla schema, but of course a lot of refinements are
possible, like to add the Top-N by frequency, or having multiple
sorted sets with different "ranks" that you can use in order to sort
the autocomplete list you provide to the user, and so forth.
Also in many real-world setups, you want to trim the first sorted set
from time to time to just retain the top elements.

It's the kind of stuff where it makes sense to build libraries
abstracting Redis away.

Ok, that's all for 2.8.9, I hope you'll enjoy this new Redis release
in the next days. If you find issues, please report them ASAP via
Github or replying here.

Regards,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)

Yiftach Shoolman

unread,

Apr 22, 2014, 6:14:07 AM4/22/14

to redi...@googlegroups.com

well done !

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--

Yiftach Shoolman
+972-54-7634621

Daniel Chatfield

unread,

Apr 22, 2014, 6:41:59 AM4/22/14

to redi...@googlegroups.com

Is there any reason why you don't follow semantic versioning? http://semver.org/

tw-bert

unread,

Apr 22, 2014, 4:17:21 PM4/22/14

to redi...@googlegroups.com

Hi Salvatore,

Excellent stuff, thank you for these amazing and much needed (+elegant) features.

Cheers, TW

Salvatore Sanfilippo

unread,

Apr 23, 2014, 9:21:47 AM4/23/14

to Redis DB

On Tue, Apr 22, 2014 at 12:41 PM, Daniel Chatfield
<chatfie...@gmail.com> wrote:

> Is there any reason why you don't follow semantic versioning?
> http://semver.org/

I kind-of use it actually, but in a slightly different form:

Basically I don't increment minor when back porting new stuff into a
stable release because this happens always, since the start of the
project, and this would reset the information about the patch level.
However I increment the minor if something *existing* was touched in a
major way, so this is how it works:

1) We use minor versions increments for new releases that have
*substantially new* implementations of existing stuff, plus new stuff,
but are backward compatibile with the past.
2) We use major versions increments for releases adding something very
big to the table, like Redis Cluster, that will be released with
3.0.0. This is the moment when we can also break APIs.
3) We use minor versions increments for bugfix releases or for
additions that don't change in substantial way a code base that
existed before, but only adds self-contained new developments.

So I think this is pretty much semver but with a different
interpretation about what is new: is something new but does not
interact with the old code? Patch version increment.
It is a significant rework/rewrite/enhancement of what was *already*
in place? Minor increment. It is major change or API break? Major
increment.

Note that incrementing the minor as you suggest for every back port of
new stuff destroys informations about the patch level. The "9" in
2.8.9 means that this release went through different iterations to
make it stable, and that if new stuff was added, it is mostly not
touching the existing things (with some exception, sometimes for
example you have bugs you can only fix by refactoring stuff in a
serious way).
Because of our continuous back port process, our patch level would be
always in the range of 0, 1, 2 or alike.

Cheers,

gon...@gmail.com

unread,

Apr 23, 2014, 9:58:05 AM4/23/14

to redi...@googlegroups.com

That's great news!

We've using http://oldblog.antirez.com/post/autocomplete-with-redis.html for 2 years and now we can just migrate and only use one zset with zrangebylex.

Also, we use bitsets to calculate daily active users. With id turning almost 30 million, this was quite expensive memory-wise. We'll replace our system with hyperloglog.

So two big wins for us!

Thanks a lot Salvatore!

Salvatore Sanfilippo

unread,

Apr 23, 2014, 10:03:29 AM4/23/14

to Redis DB

That's fantastic that the two features are going to help in the *same*
environment. Thanks for sharing!

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward