ZRANGEBYLEX & range queries

2,032 views
Skip to first unread message

Salvatore Sanfilippo

unread,
Apr 5, 2014, 6:25:07 AM4/5/14
to Redis DB
Hello all,

so finally a little dream of mine came true and we have range queries
in the unstable branch.
The code is almost totally a port of the BYSCORE stuff to
lexicographically ordering, exploiting the fact that sorted sets are
ordered lexicographically if the score is the same among items.

Let's start with a few examples to go deeper into the theory and
practice of this, and why I think it is going to be a big deal...

You starting populating a sorted set just with strings set to the same
score (whatever you want, I used 0):

127.0.0.1:6379> zadd zset 0 a 0 b 0 c 0 e 0 foo 0 bar 0 zap 0 AAAAA
(integer) 8

Now you can do range queries. First API concept that may sound non
obvious, we have the concept of string infinite, "-" is string
negative infinite, will be smaller than anything, "+" is string
positive infinite. So asking the range - + means to get all the
elements.

127.0.0.1:6379> zrangebylex zset - +
1) "AAAAA"
2) "a"
3) "b"
4) "bar"
5) "c"
6) "e"
7) "foo"
8) "zap"

When you specify actual strings instead, you are *forced* to prefix
them with ( or [, to mean a non-inclusive or inclusive interval:

So everything up to "b" including it is:

127.0.0.1:6379> zrangebylex zset - [b
1) "AAAAA"
2) "a"
3) "b"

While everything up to "b" excluding it is:

127.0.0.1:6379> zrangebylex zset - (b
1) "AAAAA"
2) "a"

Another example with the obvious usage of start-end string.

127.0.0.1:6379> zrangebylex zset [bar [foo
1) "bar"
2) "c"
3) "e"
4) "foo"

Ok you got it. There is the command with reverse ordering as well:
"zrevrangebylex" (sorry).

127.0.0.1:6379> zrevrangebylex zset [foo [bar
1) "foo"
2) "e"
3) "c"
4) "bar"

Now the important stuff about this is that... there is no collation,
stuff are compared as binary things with memcmp(). Believe it or not,
I think this is the best feature of this new commands, because you can
do collation anyway application side.
For example, if you want to preserve case or accents in the actual
string, but want to index it without caring about those attributes
from the point of view of the ordering, just add stuff like this:

foobar:FòòBAR
flipping:FlippinG

And so forth. Ask for ranges, and present to the user what is after the colons.
So you can still do the collation stuff, but what you gain with binary
comparison? That you have a general purpose indexing engine.

Want to do range queries on 128 bits integers? Well store them in
network byte ordering using 16 bytes followed by any value item you
need to index as "value", if any. As a side effect they'll be sorted.
You can do it even using decimal numbers in ASCII left-padding with zeroes:

127.0.0.1:6379> zadd zset 0 010 0 100 0 001 0 200 0 117 0 025
(integer) 6
127.0.0.1:6379> zrangebylex zset [010 [100
1) "010"
2) "025"
3) "100"

You can do a great deal of stuff with this, like building a graph
database [1], as Matteo Collina is doing with LevelUp.

[1] https://github.com/mcollina/levelgraph

I've some feeling Redis is going to be a popular completion engine
soon with this changes :-)
Moreover if people had some good time doing secondary indexes with
Redis, this is, really, a level up!

Feedbacks as always welcomed.

p.s. the implementation currently lacks tests, I'll add them in the next days.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
-- Wikipedia (Straw man page)

Josiah Carlson

unread,
Apr 5, 2014, 3:50:14 PM4/5/14
to redi...@googlegroups.com
I've been using this sort of functionality on-and-off using vanilla Redis and Lua scripting for a few years now, so I have a couple comments :)

As I mentioned in the other thread about ZLEX to tw-bert, it might make sense to offer scores as a pre-selection mechanism for those of us who know how to use them. It can make a difference for insertion times and even scanning. It could also be added later without harming anyone if anyone else wants it (no need to add it just for me :P), or excluded because those of us who know how to use them are already using Lua scripting for it.

If score + lex scanning is supported, obviously a "withscores" option would make sense.

The use of '-' and '+' along with '(' and '[' as symbols with special meanings precludes them as being part of the lexicographic scanning as the only character in the string, and in the case of '(' and '[', as being the leading part of a string too. To alleviate this, it might make sense to allow for doubling of the characters for escaping. Say, '--' to start/end scanning from the '-' character, '++' for '+', '((' for '(', and '[[' for '['. Note that this would only be applicable for strings/prefix matches where those characters were the *first* character in the input string.


As a general comment, you may want to consider the addition of a bit of specific functionality to the commands. Give me a moment to explain. Conceptually, the idea that is to be encompassed by Z[REV]RANGEBYLEX is that of a general index, which is typically implemented using a B+Tree in a relational database. The particular operations that are typically useful in such a situation for both insertion, removal are:
* Add (optionally unique) index entry X that maps to value Y
* Remove index entry X that maps to value Y
* Give me the entry/value pair(s) that match this provided range (where the start/end can be the same)

For the inserted data, typically there contains a "mapped" value Y, which in the ZSET-as-index case, the mapped value is embedded in the index entry. That's perfectly reasonable from a data schema perspective (and actually some databases do the same thing), but where you run into issues is that to have *any* use of the data, you must post-process.

Post-processing isn't a big deal for many operations, all you want is to match a prefix and return the results - getting what was matched is a convenient bonus. But as one step in a sequence of operations, that mapped value Y will usually be an id of some sort, which will have similar entries as members of other SETs or ZSETs. Those ids are there for subsequent intersection, union, and filtering operations, or even direct object access by id. How I handle it in my object mapper is to just process all of the prefix/suffix/pattern matching (they all use the same mechanism as best they can) in Lua, which allows me to extract the mapped value Y from the entries for subsequent intersection/union calculations.

Having the ability to extract the mapped Y value, and optionally create a new SET and/or ZSET inside Redis with the results (SET for unique unordered, ZSET for unique ordered - score would be the position in the original ZSET) would be pretty awesome. For the destination key, a 'STORE' option like 'SORT' would work. And for the mapped value Y extraction, an optional separator string could be included. Something like...

ZRANGEBYLEX ... STORE <destkey> SEP <sep>

In Lua the processing would look something like...

-- items are those matching items that need to be filtered
local items = ...
for i, item in ipair(items) do
    local id = string.match(item, sep.."(.*)$")
    redis.call('ZADD', destkey, i, id or item)
end

The above also adds the ZSET storage piece with index-based scores :)


And while I'm on the topic, a recent poster mentioned that it is not uncommon when performing intersection/union operations on ZSETs to only want a specific range from one (or all) of your input ZSETs. Right now the best way to handle this is to either copy the whole ZSET and delete what you don't want (what my object mapper does), or to explicitly copy that subset of data and then perform your operations (I hacked this up as an example last week: https://gist.github.com/josiahcarlson/9849543 ). Being able to use a "RANGES" argument like "WEIGHTS" to handle this would be pretty amazing.

 - Josiah


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Salvatore Sanfilippo

unread,
Apr 5, 2014, 4:48:34 PM4/5/14
to Redis DB
Hello Josiah, thanks for the great feedback! I'll check it in dept but
want to reply ASAP to two important points:

On Sat, Apr 5, 2014 at 9:50 PM, Josiah Carlson <josiah....@gmail.com> wrote:
> The use of '-' and '+' along with '(' and '[' as symbols with special
> meanings precludes them as being part of the lexicographic scanning as the
> only character in the string, and in the case of '(' and '[', as being the
> leading part of a string too

I could never do something so atrocious! You are totally right this
would be an error.
In the case of the implementation I proposed, a one-char prefix is
mandatory, so the string must start for [ or ( and later there is the
actual payload. Or it can be *just* + or -, since +foo or -bar will
raise an error.

So the API is totally safe and predictable and any binary string can be used.

About the option to specify the score, that was planned! I agree that
not everybody will understand it at a first glance, but the thing is,
sorted sets are actually indexes of indexes, the first level is the
score, and the second the lexicographical range.

One could say, what's the deal here? I can use another key if I want
another independent data structure.
Well it is not quite like this as:

1) In a single sorted set with multiple scores it is *guaranteed* that
there are no repeating elements.
2) You can move scores from one sub-set to the other
incrementing/decrementing the score.

This can be quite interesting, like, give me the range of words
between foo an bar that were previously encountered exactly ten times.
Or stuff like that.

I'm not adding it now but considering it as an option because the
problem is that the score is a float, so there is to check if it is a
bit too easy to shoot yourself in the feet by incrementing a score to
discover it is actually in a different range because of floating point
precision issues or stuff like that.

More soon and thanks again,

Josiah Carlson

unread,
Apr 5, 2014, 9:11:16 PM4/5/14
to redi...@googlegroups.com
Great news about the '-', '+', '(', and, '[', I seem to have misread and/or misinterpreted what you'd written :)

Also awesome about the scores in ZRANGEBYLEX - the variant usage you describe is interesting. One thing that you may want to specify explicitly before adding the feature is whether or not Redis will scan items with different scores, and possibly skip over entries that don't fit the range. In particular, the use that I have for it uses scores as a pre-match for data in the lexicographic order. Which is to say, I effectively store the first 7 characters of what is in the member as the score, which I use as pre-matches on the member. Now, the members are also all stored lexicographically as though the scores were all 0 due to the way I construct the score, so there would be no skipping necessary, but if someone were to have the following example data:

ZADD test 0 boo 0 cat 0 hat 1 barge 1 car

Would a "ZRANGEBYLEX test [bar [car SCORERANGE 0 1" call, where SCORERANGE implies that the next 2 values are the inclusive scores, return [boo, barge, car]? I would argue that it should only return [boo] for the sake of sanity, as "cat" should stop the matching, and allowing for disconnected runs of matches is both inefficient (you've actually got to scan all of the entries) and non-intuitive.

 - Josiah





Dvir Volk

unread,
Apr 6, 2014, 10:32:42 AM4/6/14
to redi...@googlegroups.com
This is very interesting. 
How is implemented internally? Are you keeping a trie like structure or iterating over the set with O(N) complexity?


Josiah Carlson

unread,
Apr 6, 2014, 2:47:31 PM4/6/14
to redi...@googlegroups.com
Neither. It's a ZSET. If all scores are the same, ZSETs are ordered by their members. And generally when scores are the same (say you've got two members with scores of 4.545642), they will be compared by their members. As described, the feature uses that secondary sorting by members to offer prefix range queries. Salvatore talked a bit about the functionality way back in 2010 [1], and I among others improved it to not require adding variants of every string you want, which you can find in my book. Redis-only code available at [2], Lua scripting variant here available at [3]. For my Redis object mapper, I use the scores as a pre-filter for the character prefixes (converting the first 7 characters to a score), and toss in pattern matching along with the 'mapped value/id' extraction in [4].

To find the range for AA to BB with the new commands that Salvatore added, internally Redis pretends as though you are inserting those values into the ZSET, but instead of inserting them, it just grabs a pointer for the item that would have come immediately after the starting insertion. Then it iterates from that value, comparing its member/score against the range you are interested in. It's an initial O(logn) query, followed by a linear scan over those elements that are to be returned, stopping when it discovers an element outside the range.

Some Redis-only or Lua scripting variants insert entries to find the endpoints ([2] and [3]), and others use scores to discover them ([4]), so the non-native variants will have more O(logn) queries to potentially get the same job done. Though currently the Lua scripting versions can offer more functionality, among which I described in my first post to the thread, which describes what [4] does.

 - Josiah

Salvatore Sanfilippo

unread,
Apr 6, 2014, 4:49:42 PM4/6/14
to Redis DB
Thanks Josiah,

I want to just add that ranges in O(log(N)) seek + O(M) to scan
elements is optimal in the context of the API exposed, you can do
better only with a linear array indexed by an integer index where you
have O(1) + O(M).
And that because of the peculiarities of sorted set implementations,
where we used an augmented data structure with rank information, and
an associated hash table with entry -> score mapping, we have other
interesting non common features:

1) A rank operation is available, again with O(log(N)) complexity.
2) To retrieve the score of an element is O(1).
3) Range-size operation (ZCOUNT) is O(log(N)), instead of the common
(log(N)+M), since it uses the fact that you can instead two ranks and
subtract them, because of "1".
4) get-min-element operation is O(1) because of the way skiplists are organized.

Salvatore

Salvatore Sanfilippo

unread,
Apr 6, 2014, 4:50:50 PM4/6/14
to Redis DB
On Sun, Apr 6, 2014 at 10:49 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> 2) To retrieve the score of an element is O(1).

And perhaps more interesting testing for existence is O(1) as well.

Josiah Carlson

unread,
Apr 6, 2014, 5:01:52 PM4/6/14
to redi...@googlegroups.com
On Sun, Apr 6, 2014 at 1:49 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
Thanks Josiah,

I want to just add that ranges in O(log(N)) seek + O(M) to scan
elements is optimal in the context of the API exposed, you can do
better only with a linear array indexed by an integer index where you
have O(1) + O(M).

I think you are skipping a few steps in your description, as I'm pretty sure the only way you can guarantee O(1) lookups for where to start is if either 1) you already know the index of the entry you are looking for in that array, 2) you use a hash table and know what you are looking for is in the hash, or 3) you use a trie (which you then need to describe running time based on the length of the data you are looking for, and pointer chasing makes this generally slower than a comparison-based method for most cases).

 - Josiah

Salvatore Sanfilippo

unread,
Apr 6, 2014, 5:07:44 PM4/6/14
to Redis DB
On Sun, Apr 6, 2014 at 11:01 PM, Josiah Carlson
<josiah....@gmail.com> wrote:
> I think you are skipping a few steps in your description, as I'm pretty sure
> the only way you can guarantee O(1) lookups for where to start is if either
> 1) you already know the index of the entry you are looking for in that array

Yes we are talking about the same thing, what I mean is that you can
do better if the API does not take an element or score, but the index
inside the array :-)
In Redis we have no such a fundamental data structure (the Array)
since at the Redis level of abstraction is rarely useful, and,
SETRANGE/GETRANGE are often near enough.

Mohamed Wael Khobalatte

unread,
Apr 6, 2014, 11:13:39 AM4/6/14
to redi...@googlegroups.com
Great stuff Salvatore. Though I am not sure what you meant when you said that this will be a "popular completion engine". Do you mean autocompletion engines? And if so, are there instances you have in mind where this functionality could make a better completion engine out of the ones using Zsets? (The one that comes to mind is yours that you developed in here[1]). My colleagues and I run a solid autocompletion engine ourselves out of regular redis commands, so it's good to understand how this could make it even more awesome than it already is. 

--
Mohamed Wael Khobalatte

Salvatore Sanfilippo

unread,
Apr 6, 2014, 5:35:45 PM4/6/14
to Redis DB
Hello Mohamed, my example in [1] needs a lot more memory that could be
otherwise required (like an order of magnitude more), and returns a
lot of useless entries. Also this was a trick to make a completion
engine but does not scale to general purpose indexing that is possible
with ZRANGEBYLEX. So it is definitely much better now! :-)

Btw ZRANGEBYLEX will be back ported to 2.8 as well, like the
HyperLogLog implementation, so this will be usable ASAP.

Salvatore

tw-bert

unread,
Apr 7, 2014, 12:38:04 AM4/7/14
to redi...@googlegroups.com
Hi Salvatore, this is great.

Seems I can throw my ZLEX pull request in the bin (if only I'd waited a while... sigh, can't look into the future alas).

I do want to put emphasis on Josiah's request to use encoded scores (and 'walk' through the scores before walking through the skiplist).
We used a similar approach (encoding the first few characters as a IEEE 753 FP double), float comparison seemed quicker that walking the skip list.

I had a new design ready for ZLEX that facilitates this:
  1. BRSTARTS
    replaces: POSTALPHABREAK
    Break at starts-with
  2. CSCORE 0
    replaces: ALLSCORES; Return members with higher scores as well.
    “Continued-Score” or “Coded-Score”
  3. CSCORE '2.6686011349572752e-303'
    new functionality:
  4. Returns up until (and including) the given score. Don't return members with higher scores.
  5. Within the given score, BRSTART becomes active.
Obviously, I'm going to stall implementing this, no need now.

If you missed my post about ZLEX, [here](https://groups.google.com/forum/?fromgroups=#!topic/redis-db/KMKIPtyoJ-w) is the thread, and [here](https://github.com/antirez/redis/pull/1634) is the pull request.

Thanks again, TW

tw-bert

unread,
Apr 7, 2014, 12:40:38 AM4/7/14
to redi...@googlegroups.com
Ouch.., sorry about including the complete original post by Antirez in my reply.

tw-bert

unread,
Apr 7, 2014, 12:54:03 AM4/7/14
to redi...@googlegroups.com
Hi Salvatore,

Addendum to my previous post:

ZLEX has the following functionality:
The start parameter may be negative, to get a preroll of sorted set elements before the actual start (total number of results will never be more than stop - start + 1).

It is (in my opinion) a fairly common situation when dealing with pagination.
What are your thoughts on this?

I'd be happy to lend a hand implementing it ofcourse.

Kind regards, TW






Salvatore Sanfilippo

unread,
Apr 7, 2014, 2:32:52 AM4/7/14
to Redis DB
On Mon, Apr 7, 2014 at 6:38 AM, tw-bert <tw.be...@gmail.com> wrote:
> Hi Salvatore, this is great.
>
> Seems I can throw my ZLEX pull request in the bin (if only I'd waited a
> while... sigh, can't look into the future alas).
>
> I do want to put emphasis on Josiah's request to use encoded scores (and
> 'walk' through the scores before walking through the skiplist).
> We used a similar approach (encoding the first few characters as a IEEE 753
> FP double), float comparison seemed quicker that walking the skip list.

Hello!

I'm sorry that we duplicated work, however as a partial consolation
you should check my commit date: what I committed is code from 2012
:-)
So there was a duplication problem when you posted your PR 14 days ago already.
Well we can use this "race condition" to improve the design perhaps.

About walking by score, if it is for the sake of speed, I'm definitely
skeptical as a first reaction unless I see benchmarks, a profiling
attempt, an optimization effort, and then new benchmarks :-)
I *never* make an API more complex for the sake of speed unless it is
an order of magnitude difference that I can't help with other means.

My reasoning for adding a "SCORE <score>" option (note, not a range, a
single value) was to conceive the sorted with as many small
lexicographically ordered sets. This makes sense because it is
semantically interesting, you can move entries from one sub-set to
another acting in the score with ZINCRBY.

I want to make an actual example otherwise this may seem like
far-fetched. You have a completion engine and you are interested in
showing suggestions based on popularity.
You can take a sorted set where score is the number of times you
encountered a given request, every time you reach a new order of
magnitude when incrementing (think like 10, 100, 1000, 10000...)
you increment the element in the other sorted set you use for autocompletion.

When autocompleting, you check the max score in the sorted set and
query with "SCORE <score>" first, if there are not enough entries, you
check with score-1, and so forth.

tw-bert

unread,
Apr 7, 2014, 2:55:45 AM4/7/14
to redi...@googlegroups.com
Hi Salvatore,

I'm sorry that we duplicated work, however as a partial consolation
you should check my commit date: what I committed is code from 2012
:-)
So there was a duplication problem when you posted your PR 14 days ago already.
Well we can use this "race condition" to improve the design perhaps.

Agreed + no worries. I did see the Github Issue here: https://github.com/antirez/redis/issues/324 , but had no idea most of the code was already committed.
I should have queried the changelog.
My implementation is mostly a subset of ZRANGEBYLEX anyway, and your implementation most probably outclasses mine by a mile or two.
In short: I'm very happy, we needed this functionality, so I wrote it myself. But now it's there 'by magic' at the right time, and soon part of the stable. That's much better than having to patch and keep up.

You could grab some of my TCL tests, they might be of use.

About walking by score, if it is for the sake of speed, I'm definitely
skeptical as a first reaction unless I see benchmarks, a profiling
attempt, an optimization effort, and then new benchmarks :-)
I *never* make an API more complex for the sake of speed unless it is
an order of magnitude difference that I can't help with other means.

That's the way to go, I completely agree.
Ignore my remark about walking scores, my tests were not elaborate at all. My question mainly was an afterthought after a small (digital) conversation with Josiah, Josiah's 'rom' ORM, and a similar approuch we took ourselves before writing ZLEX. The only goal was speed improvement by the way, nothing semantical. The simpler the better.

Thanks again, TW


Salvatore Sanfilippo

unread,
Apr 7, 2014, 3:26:41 AM4/7/14
to Redis DB
Hello again,

On Mon, Apr 7, 2014 at 8:55 AM, tw-bert <tw.be...@gmail.com> wrote:

> Agreed + no worries. I did see the Github Issue here:
> https://github.com/antirez/redis/issues/324 , but had no idea most of the
> code was already committed.
> I should have queried the changelog.

To be honest the code saved itself in some way for 2 years sitting in
a local branch in my disk... I had no longer idea when I grepped the
branches and saw it was already finished for the skiplist side, and
almost finished for the ziplist implementation, but never competed. I
just fixed a few things and the code is working since then totally
stable, because it was a one-to-one map with the BYSCORE stuff so we
inherited the stability of the old code.

Just as an historical note on how I, and I bet most programmers change
their minds over time, I stopped the work because I was too concerned
with collation. Only later I realized not handling the collation was
like not handing encoding in Redis: a great idea...

> My implementation is mostly a subset of ZRANGEBYLEX anyway, and your
> implementation most probably outclasses mine by a mile or two.

Oh if this is the case it's up to Pieter's refactoring of my original
code, I followed the same schema for my implementation.

> In short: I'm very happy, we needed this functionality, so I wrote it
> myself. But now it's there 'by magic' at the right time, and soon part of
> the stable. That's much better than having to patch and keep up.
>
> You could grab some of my TCL tests, they might be of use.

Indeed, thanks

> That's the way to go, I completely agree.
> Ignore my remark about walking scores, my tests were not elaborate at all.
> My question mainly was an afterthought after a small (digital) conversation
> with Josiah, Josiah's 'rom' ORM, and a similar approuch we took ourselves
> before writing ZLEX. The only goal was speed improvement by the way, nothing
> semantical. The simpler the better.

Thanks for the follow up about that. I did some benchmarking. The
hardware is bare-metal typical Linux box running a recent kernel:

src ➤ redis-benchmark -P 32 -q -n 1000000 zrangebyscore z2 49999050 49999800
zrangebyscore z2 49999050 49999800: 128667.02 requests per second
src ➤ redis-benchmark -P 32 -q -n 1000000 zrangebylex z1'[0000499981'
'[000049999'
zrangebylex z1 [0000499981 [000049999: 112599.94 requests per second

Sorted sets cardinality is 1 million each, created with the same
random number generator.

There is little speed difference when accessing, both ranges are
designed to return 6 elements.

However I found that when creating the sorted sets there was a bigger
difference, ZADD performed better when scores where different.
Maybe this is because using the score it is like if the tree has
height+1 compared to the flat score but I did not profiled.

Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)

Josiah Carlson

unread,
Apr 7, 2014, 4:29:49 AM4/7/14
to redi...@googlegroups.com
The performance difference is 14% for the zrangebyscore vs. zrangebylex. Also, if you are thinking about supporting SCORE for a single score, why not for two? And call it SCORES. Or keep it as SCORE. Give it the same semantic for endpoints as ZRANGEBYSCORE, and it offers a superset of the functionality of ZRANGEBYSCORE.

Heck, I *might* argue that it would make sense to just augment/expand the ZRANGE and ZREMRANGE API:

ZRANGE <key> [[LIMIT] <start> <stop>] [SCORE <score start> <score stop>] [LEX <lex start> <lex stop>] [WITHSCORES]
ZREMRANGE <key> [[LIMIT] <start> <stop>] [SCORE <score start> <score stop>] [LEX <lex start> <lex stop>]

By using LIMIT and omitting all others, you get ZRANGE. By using SCORE and omitting all others, you get ZRANGEBYSCORE. By using LEX and omitting all others, you get ZRANGEBYLEX as originally described by Salvatore. Combine SCORE and LIMIT and you have pagination over score ranges. Combine LIMIT and LEX and you have pagination over autocomplete lists. Combine SCORE and LEX and you have optimized autocomplete list scanning. Combine SCORE, LEX, and LIMIT and you can paginate over your optimized autocomplete list scanning.

ZRANGE is going to be backwards compatible by making "LIMIT" be optional in some cases. If LIMIT is not present, then two numeric values can be present as the 2nd and 3rd arguments to ZRANGE (the two immediately after <key>). In that situation, either it performs like historic ZRANGE, or it is modified by SCORE and LEX arguments, and it offers pagination over the modified variant semantics. If LIMIT is omitted along with the relevant <start> <stop> arguments, then a complete SCORE and/or LEX clause must be present.

Oh, and you get the same behavior with ZREMRANGE. 

To go even farther over the top, let's add more to the ZRANGE side. Pass COUNT to count the number of values like ZCOUNT (getting the full selection and limiting API). Pass SUM to sum the scores, which is something that a *lot* of people have been wanting for a long time. You could pass SCAN and MATCH and get ZSCAN (though maybe this one is better left separate). Add STORE to make a quick copy of a subset. 

Note that I am not seriously proposing this, just tossing it out there as the ultimate Unix-y tool: the commands have a bazillion arguments, so there aren't very many of them. Please don't do this. Maybe consider it a late April Fool's joke. Or maybe consider that just adding one more option to ZRANGEBYLEX isn't all that bad. :D

 - Josiah




tw-bert

unread,
Apr 7, 2014, 4:35:34 AM4/7/14
to redi...@googlegroups.com


On Monday, 7 April 2014 09:26:41 UTC+2, Salvatore Sanfilippo wrote:
To be honest the code saved itself in some way for 2 years sitting in
a local branch in my disk...
Ah well, shit happens :).
I did try to contact you before I started coding, just saw your post about gmail communication mishap, again... sh!t happens.
Thank you for following up on this tiny issue, appreciated.

Just as an historical note on how I, and I bet most programmers change
their minds over time, I stopped the work because I was too concerned
with collation. Only later I realized not handling the collation was
like not handing encoding in Redis: a great idea...

Yes, collation/tokenization shouldn't be the responsability of redis.
 It could be built in for "ease of use", but there is a risk of bloating it's core functionality.
 
> My implementation is mostly a subset of ZRANGEBYLEX anyway, ...

Oh if this is the case it's up to Pieter's refactoring of my original
code, I followed the same schema for my implementation.

I *didn't*. I convinced the company I work for (rightfully so) that this functionality was needed, and got the space to implement what we (as a company) need. Not the complete specs. Happily, the plan was always to give it back to the community.

I did some benchmarking. The
hardware is bare-metal typical Linux box running a recent kernel:

src ➤ redis-benchmark -P 32 -q -n 1000000 zrangebyscore z2 49999050 49999800
zrangebyscore z2 49999050 49999800: 128667.02 requests per second
src ➤ redis-benchmark -P 32 -q -n 1000000 zrangebylex z1'[0000499981'
'[000049999'
zrangebylex z1 [0000499981 [000049999: 112599.94 requests per second

Sorted sets cardinality is 1 million each, created with the same
random number generator.

There is little speed difference when accessing, both ranges are
designed to return 6 elements.

Good news. Just out of curiousity, I did the same redis-benchmark with ZLEX, and the results are very similar.

However I found that when creating the sorted sets there was a bigger
difference, ZADD performed better when scores where different.
Maybe this is because using the score it is like if the tree has
height+1 compared to the flat score but I did not profiled.

Interesting. The head of the skip list might get a level +1 is my guess, which is probably exactly the same as you just said.

Kind regards, TW

Matteo Collina

unread,
Apr 7, 2014, 5:30:26 AM4/7/14
to redi...@googlegroups.com
Hi Salvatore,

This is awesome! :D It landed two days later than we discussed it.

I've tested the new command, and it works awesomely with redis-cli and a low number of elements in there (I'm not bulk-loading it at the moment). I'm eager to try levelgraph on top of it.
ZRANGEBYSCORE accepts a limit/offset combination, is it present also in zrangebylex?

Cheers,

Matteo

tw-bert

unread,
Apr 8, 2014, 7:33:34 PM4/8/14
to redi...@googlegroups.com
@Matteo Collina:

Yes, the [LIMIT offset count] option is supported, I just checked. A negative offset is however not supported.

The [WITHSCORES] option however, is not supported.


@Salvatore Sanfilippo:

Any reason why you left [WITHSCORES] out? Implementation is trivial, as you well know ;)
A negative offset : should also be easy to implement, what are your thoughts on this? See my previous post in this thread about 'a preroll'.

Also let me know if I can help implementing.

Cheers, TW

tw-bert

unread,
Apr 8, 2014, 7:45:24 PM4/8/14
to redi...@googlegroups.com
Aaaah, forget that [WITHSCORES] comment. Doing too many things at once. Makes no sense at all, sorry about that.

Hugues Malphettes

unread,
Jun 27, 2014, 11:53:16 PM6/27/14
to redi...@googlegroups.com
Hi Matteo and Salvatore,

The test suite is green.
It actually completes 20% faster with redis as a backend than with leveldown.

I stumbled on this thread and thought you guys might be interested.

Cheers,
Hugues

[1] I hacked level-test to force it to always use redisdown instead of memdown or leveldown

Salvatore Sanfilippo

unread,
Jun 28, 2014, 8:29:34 AM6/28/14
to Redis DB
Cool stuff! Thank you. Was it already added in the list of LevelUP backends?


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a proposition by replacing it with a superficially similar yet unequivalent proposition (the "straw man"), and to refute it
       — Wikipedia (Straw man page)

Matteo Collina

unread,
Jun 28, 2014, 8:39:43 AM6/28/14
to redi...@googlegroups.com
Hi Huges,

Thank you for bringing my idea to life! These have been busy months and I did not have time for building it :).
I'm very happy we can now use Redis as a backend for LevelUp!

Cheers,

Matteo

Hugues Malphettes

unread,
Jun 28, 2014, 9:44:21 AM6/28/14
to redi...@googlegroups.com

Cheers!

It is already in the list of LevelUP backends: https://github.com/rvagg/node-levelup/wiki/Modules

Hugues

You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/GrCM183WkxM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages