Hi Josiah, excellent feedback, thank you for your insights. *Very* valuable.
'Generally speaking, it is difficult to get new commands added to Redis.' -> noted, and understandable. I do have some hopes, because of the ZRANGEBYLEX proposal issued by Antirez himself.
'rom': Thanks for pointing out your alternative. 'rom' looks very good (far better than another mapper 'stdnet' I tried to adopt), but I don't need the data inside python. At least not in most use cases. The data streams from a propriatary language (Progress ABL) through a named pipe into redis. I do use an embedded python thread as a broker, but not for the data. I will look into it ('rom'), and will definately adopt some ideas.
The data is very high speed, converting all data to python objects (which I don't need) is not an option here. Inside python, I just 'pass on' byte arrays.
'My talking about adding a limit is with regards to my script' -> I misunderstood, thanks.
'There is no free lunch and a skiplist isn't magic.' -> Ofcourse I agree :). However, skiplists as well as binary tree searches do scale well. O(log(n)) does apply to both. It all comes down to real-life testing in the end. Using ZLEX solves all my performance problems. Zooming in to an alphabetical index entry stays well below 1 ms, even with 10 million rows. This is wall-clock time. I like the wikipedia entry on skip lists:
http://en.wikipedia.org/wiki/Skip_list . You might (well, given your expertise probably not) be a bit too pessimistic about 'still checking *all* levels'.
'the code is a 99% copy/paste from a prefix of the zslInsert() function' -> Well spotted and 100% correct, this is however intentional. Same for zzlInsert() btw. Setting this small piece of code aside helps with patching, and helps with scanning for changes in the original code. It should not be in the final patch, I'll make sure of that. It will eventually be refactored, if it gets adopted by Antirez.
'null vs tab' -> yes, we use null characters as well. I just omitted that detail (amongst others) for simplicity.
'If you use scores as a pre-match' / 'unless you have a lot of strings that match on more than a 7 character prefix' -> Excellent description. This was the reason why I had to write ZLEX. Most of our comparisons exceed the 7 character prefix. A previous solution used conversion of ascii coding to 7 character chunks, allowing for unlimited string length an using scores only. The bookkeeping was however very cumbersome, and caused to big a hit on performance. I needed a simpler solution, ZLEX was the result. I always thrive for simplicity. The 7 character prefix *will* be encoded as a score (especially now I read your insights). That's why I implemented BREAKMODE NOBREAK in the design.
'"BREAKMODE" term is unnecessary' -> I think that's true. I did it like this, to block simultanious use of POSTALPHASTOP and NOBREAK. I'll adjust it.
'POSTALPHASTOP' -> English is not my native language (it's Dutch). Good point lol, I'll think of a better term ;)
'what happens in the case of NOBREAK if ...' -> You are right, I'll update the wiki.
'strange semantic in the ZSET context' -> Yes, that's true. I did it like this, to be able to get a 'preroll' in the results. My first implementation also had the standard OFFSET parameters, but this overlapped with the requested range (so I tossed it). Semantically ZLEX *is* different from the rest of the bunch, since it's relative to a cursor. I'm open to suggestions.
'In your example, you enter "b", but it returns "Bernard" and "Bob", neither of which would have matched.' -> These are the post-aggregate results, as explained in the following paragraph. Apparently this is not clear, I'll improve that.
Thanks again, TW