ZLEX - New redis-server function. Plz feedback.

248 views
Skip to first unread message

tw-bert

unread,
Mar 23, 2014, 2:53:51 PM3/23/14
to redi...@googlegroups.com
First of all: Me and my colleagues have slowly grown into serious Redis addicts over the past year :) Thank you for this elegant and robust NoSQL solution.

I'm integrating Redis in a larger project, where Redis caches parts of our big RDBMS-based custom logistics application to be used through an N-Tier architecture, including the web.

During the design phase, we found that we were missing important Redis core functionality regarding indexes.
Therefore I implemented it, during the last two months (not continuously ofcourse, but still no light sub-project).

I'd like to make it available for the community. Please provide feedback.

Here are the functional specs: GitHub - ZLEX-redis-server-extension

For some more background info, if interested, see: SO - alphabetical-index-with-millions-of-rows-in-redis

I'm doing some testing today, I'll probably push the patches to the tw-bert repo tomorrow.
As stated in CONTRIBUTING, it's best to discuss it here first.

TIA, cheers, TW



tw-bert

unread,
Mar 27, 2014, 3:45:54 PM3/27/14
to redi...@googlegroups.com
I had a bit of trouble posting to this group.
In the meantime, I've posted a pull request here: https://github.com/antirez/redis/pull/1634 

Thank you for any feedback, kind regards, TW

Josiah Carlson

unread,
Mar 28, 2014, 6:29:35 PM3/28/14
to redi...@googlegroups.com
This is similar to the method I used for autocomplete in my book and in my Python/Redis object mapper rom.


It wouldn't be difficult to limit it to a fixed number of entries. Something to consider :)

 - Josiah



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

tw-bert

unread,
Mar 28, 2014, 8:39:22 PM3/28/14
to redi...@googlegroups.com
Hi Josiah,

Thank you for your reply.

It is already limited to a fixed number of entries, so I'm not sure what you mean. Can you elaborate? Are my specs unclear?

Your autocomplete function is very good, but not the same (in my opinion). I had a use case of 10 million members in a sorted set where I needed to zoom in on the correct index entry (alphabetically) instantly, without overhead in the write part. Have you read the SO post I refer to?

With ZLEX, the 'zoom in' part is achieved in an average of 7 or 8 skiplist hops (with 10 million rows of identical score), which is very little, thus extremely fast. There is no mapping involved whatsoever, so the write part is 'just a ZADD' and nothing more.

Kind regards, TW

Josiah Carlson

unread,
Mar 29, 2014, 1:27:08 AM3/29/14
to redi...@googlegroups.com
On Fri, Mar 28, 2014 at 5:39 PM, tw-bert <tw.be...@gmail.com> wrote:
Hi Josiah,

Thank you for your reply.

It is already limited to a fixed number of entries, so I'm not sure what you mean. Can you elaborate? Are my specs unclear?

Generally speaking, it is difficult to get new commands added to Redis.

I linked to my Lua script because in the case that Salvatore decides not to accept your patch, you have an alternative that doesn't involve maintaining your own version of Redis. My talking about adding a limit is with regards to my script, which currently finds all matches (as well as general pattern matches, optimized to prefix lookups when possible).

Your autocomplete function is very good, but not the same (in my opinion). I had a use case of 10 million members in a sorted set where I needed to zoom in on the correct index entry (alphabetically) instantly, without overhead in the write part. Have you read the SO post I refer to?

With ZLEX, the 'zoom in' part is achieved in an average of 7 or 8 skiplist hops (with 10 million rows of identical score), which is very little, thus extremely fast.

There is no free lunch and a skiplist isn't magic. Counting how many "hops" you perform through individual skiplist nodes doesn't accurately measure the time that the operation uses. Even if you don't step to another skiplist node, you are still checking *all* levels in the skiplist (which should be 11-16 levels for a 10m entry skiplist), which means that you are looking at roughly 2*hops + levels of uncached pointer references and actual value comparisons. That gets you 25 to 32 slow operations to find your entry, which is roughly expected of any comparison-based structure. Technically speaking, skiplists are approximately as efficient as a balanced binary search tree.

Also, the code is a 99% copy/paste from a prefix of the zslInsert() function just above it, and Salvatore makes no claims as to beating a binary tree in terms of general efficiency. On a related note, it might be better to refactor the code so that there is only one function with that block rather than two.
 

There is no mapping involved whatsoever, so the write part is 'just a ZADD' and nothing more.

There are a lot of moving pieces to what my entire library does, but one of the simpler pieces is the prefix (and suffix) lookup code. It generates a score based on a 7 character prefix (or suffix), and uses that as an initial entry into the ZSET for scanning entries. Aside from the score generation, adding to Redis itself is a simple ZADD, and scanning is just running the Lua script. The score and my use of the null character \0 instead of tab \t is the major difference between what you have described as being one of the primary uses, and what I have previously implemented using Lua.

I have also previously used scores of 0 and used insertion/removal to find indexes (actual inserting instead of the zslInsertPretend() function that you added), which was the version in my book, but it's not as efficient* and ends up writing to the ZSET unnecessarily.

* Using scores of 0 in order to force lexicographic ordering on keys wastes the score comparison operation without substantive gain. What ends up happening is that rather than using the fast double comparison to make a decision, you defer to the slow string comparison operation while also doing a wasted double comparison. If you use scores as a pre-match, you can avoid basically all of the "find the starting index" string comparisons unless you have a lot of strings that match on more than a 7 character prefix.


But to comment on your specification, here are the issues that I see:

> ZLEX key score asciistring start stop [BREAKMODE {[POSTALPHASTOP|NOBREAK]}] [WITHSCORES] [WITHHEADER]

The "BREAKMODE" term is unnecessary, you can use POSALPHASTOP or NOBREAK by themselves unambiguously.

Also, despite English being my first language, POSTALPHASTOP is very difficult to mentally parse to Post Alpha Stop, and my initial reading made me wonder "What is a Postal Pha Stop?" Is it a postal worker stopping off for some misspelled Vietnamese soup? But anyway, the default behavior should end when the prefix stops matching, and NOBREAK should be the optional "add this flag if you want to continue past the typical match".

You haven't specified what happens in the case of NOBREAK if the scores are the same but the prefix doesn't match. The wording of POSTALPHASTOP would suggest that NOBREAK would continue even if scores were the same and the prefix doesn't match, but that's not clear. Either way, toss BREAKMODE and POSTALPHASTOP, and only use NOBREAK.

> The start parameter may be negative, to get a preroll of sorted set elements before the actual start (total number of results will never be more than stop - start + 1).
Generally, Redis uses negative indexes to index from the *end* of the LIST or ZSET. Using a negative value to represent "earlier than the first entry that matches the prefix" is a strange semantic in the ZSET context.

> Real-life example - EX3; auto-suggest lists
In your example, you enter "b", but it returns "Bernard" and "Bob", neither of which would have matched.


 - Josiah

tw-bert

unread,
Mar 29, 2014, 8:35:38 AM3/29/14
to redi...@googlegroups.com
Hi Josiah, excellent feedback, thank you for your insights. *Very* valuable.

'Generally speaking, it is difficult to get new commands added to Redis.' -> noted, and understandable. I do have some hopes, because of the ZRANGEBYLEX proposal issued by Antirez himself. 

'rom': Thanks for pointing out your alternative. 'rom' looks very good (far better than another mapper 'stdnet' I tried to adopt), but I don't need the data inside python.  At least not in most use cases. The data streams from a propriatary language (Progress ABL) through a named pipe into redis. I do use an embedded python thread as a broker, but not for the data. I will look into it ('rom'), and will definately adopt some ideas.
The data is very high speed, converting all data to python objects (which I don't need) is not an option here. Inside python, I just 'pass on' byte arrays.

'My talking about adding a limit is with regards to my script' -> I misunderstood, thanks.

'There is no free lunch and a skiplist isn't magic.' -> Ofcourse I agree :). However, skiplists as well as binary tree searches do scale well. O(log(n)) does apply to both. It all comes down to real-life testing in the end. Using ZLEX solves all my performance problems. Zooming in to an alphabetical index entry stays well below 1 ms, even with 10 million rows. This is wall-clock time. I like the wikipedia entry on skip lists: http://en.wikipedia.org/wiki/Skip_list . You might (well, given your expertise probably not) be a bit too pessimistic about 'still checking *all* levels'. 

'the code is a 99% copy/paste from a prefix of the zslInsert() function' -> Well spotted and 100% correct, this is however intentional. Same for zzlInsert() btw. Setting this small piece of code aside helps with patching, and helps with scanning for changes in the original code. It should not be in the final patch, I'll make sure of that. It will eventually be refactored, if it gets adopted by Antirez.

'null vs tab' -> yes, we use null characters as well. I just omitted that detail (amongst others) for simplicity.

'If you use scores as a pre-match' / 'unless you have a lot of strings that match on more than a 7 character prefix' -> Excellent description. This was the reason why I had to write ZLEX. Most of our comparisons exceed the 7 character prefix. A previous solution used conversion of ascii coding to 7 character chunks, allowing for unlimited string length an using scores only. The bookkeeping was however very cumbersome, and caused to big a hit on performance. I needed a simpler solution, ZLEX was the result. I always thrive for simplicity. The 7 character prefix *will* be encoded as a score (especially now I read your insights). That's why I implemented BREAKMODE NOBREAK in the design.

'"BREAKMODE" term is unnecessary' -> I think that's true. I did it like this, to block simultanious use of POSTALPHASTOP and NOBREAK. I'll adjust it.

'POSTALPHASTOP' -> English is not my native language (it's Dutch). Good point lol, I'll think of a better term ;)

'what happens in the case of NOBREAK if ...' -> You are right, I'll update the wiki.

'strange semantic in the ZSET context' -> Yes, that's true. I did it like this, to be able to get a 'preroll' in the results. My first implementation also had the standard OFFSET parameters, but this overlapped with the requested range (so I tossed it). Semantically ZLEX *is* different from the rest of the bunch, since it's relative to a cursor. I'm open to suggestions.

'In your example, you enter "b", but it returns "Bernard" and "Bob", neither of which would have matched.' -> These are the post-aggregate results, as explained in the following paragraph. Apparently this is not clear, I'll improve that.

Thanks again, TW

tw-bert

unread,
Mar 29, 2014, 9:10:56 AM3/29/14
to redi...@googlegroups.com
My apologies for the crappy english in my previous post. "Enthousiasm" vs "non native language" vs "throttling my trigger finger".

About tossing 'BREAKMODE':

Suggestion:
ALLSCORES -> replaces 'BREAKMODE NOBREAK'
BEGINS -> replaces 'BREAKMODE POSTALPHASTOP'

This describes the executed functionality less precise, but it's shorter and simpler.

Any feedback appreciated,

Kind regards, TW

Matt Stancliff

unread,
Mar 30, 2014, 12:18:01 PM3/30/14
to redi...@googlegroups.com

On Mar 29, 2014, at 1:27 AM, Josiah Carlson <josiah....@gmail.com> wrote:

> Generally speaking, it is difficult to get new commands added to Redis.

Yup! To help people maintain their own commands, I released pluggable command support last week: https://matt.sh/advanced-redis-command-loading

You can maintain your commands outside of the Redis codebase, you can load custom commands at startup or after the server is running, and you can reload your commands at runtime (giving you zero-server-restart debugging/development).


-Matt

tw-bert

unread,
Mar 30, 2014, 2:09:47 PM3/30/14
to redi...@googlegroups.com
Hi Matt, this seems like excellent news. I was always under the assumption that Antirez was opposed to plugin support, I assume this has changed? I had a quick look at the unstable branch of antirez-redis, did not see it (but might have overlooked it). Any rough estimate when it will be part of redis-stable?

It looks excellent, with this in mind I might make ZLEX less generic. It will be interesting to see how/when online Redis services like Heroku will supply support for this.

Thank you very much, TW

Matt Stancliff

unread,
Mar 30, 2014, 4:37:21 PM3/30/14
to redi...@googlegroups.com

On Mar 30, 2014, at 2:09 PM, tw-bert <tw.be...@gmail.com> wrote:

> Hi Matt, this seems like excellent news. I was always under the assumption that Antirez was opposed to plugin support, I assume this has changed?

Nope, nothing changed. The brilliant thing about having all the source is: you don’t have to ask for permission to try new things.


> I had a quick look at the unstable branch of antirez-redis, did not see it (but might have overlooked it).

The pluggable command code sits on top of Redis without changing any existing features. I consider it stable for running all built-in Redis commands. You can grab a version based on unstable or one back ported to a released Redis version.


> Any rough estimate when it will be part of redis-stable?

There’s a Redis 2.8.8 with the dynamic loading support under Releases at https://matt.sh/dynamic-redis

I don’t actually expect it to be integrated into the primary distribution, but some people may find it useful. I have a few more features planned too. If you run into any unexpected limitations or think of additional features, let me know.


> It looks excellent, with this in mind I might make ZLEX less generic. It will be interesting to see how/when online Redis services like Heroku will supply support for this.

Thanks! The command module approach does require everything to be more self-contained, but that’s better for sharing. Ideally people will still separate functionality into different files so others can re-use any custom utility functions they come up with (kinda like how you add code to 5 different .c file in Redis now. Hopefully people will continue to write in a separation-of-concerns fashion and not make one giant 2,000 line mymodule.c).

I hadn’t considered a hosted service allowing loadable modules. It would require two things: you should build against the source of whatever Redis version they are running — and — they have to give you a way to put your module on the server where Redis is running so it can open the shared library (or maybe we just need a module-add-from-binary setting where you give it the full contents of a module and the server deploys it for you).


-Matt

tw-bert

unread,
Mar 30, 2014, 5:41:13 PM3/30/14
to redi...@googlegroups.com
Thanks for the clarification.

'The brilliant thing about having all the source is: you don’t have to ask for permission to try new things.' -> That's true, but the less-brilliant thing is that you have to patch, and that your private fork will never be as thoroughly tested 'in the wild' as the official branch. I understand and appreciate your efforts to contain this. It will however only succeed if a lot of people adopt your dynamic-redis fork (if I understand correctly).

For now, I'll stick with just a ZLEX patch. I coded it in a way that I can easily (automated) patch it on top of the last stable release. This is done simply by using the patch command on *nix (patch -p1 -i zlex-extension-for-redis.patch --binary --batch). As far as dynamic-redis goes, I'll keep a close eye. This is not reluctance by the way, but I need to limit the scope of this project (which is part of a much bigger project). No unlimited resources here I'm afraid.

About 3dp modules in hosted Redis: There are analogies. Heroku does support pip for python modules for example, which can ofcourse be binary (Cython and the like). I do think a similar extension-repository would give your efforts a nice foundation to grow upon.

Best of luck, TW
Reply all
Reply to author
Forward
0 new messages