On Sat, Jun 16, 2012 at 2:45 PM, Dvir Volk <
dv...@everything.me> wrote:
> Thanks Josiah! I was hoping to get a response from you on this :)
You could always email me directly. No one ever thinks of that ;)
> That's what I had in mind, I just figured there are ways to compress the
> representation even more than 6 bytes, while keeping the ordering intact.
>
> The thing is, it looks like this will not allow ZRANGE paging on the key,
> because say you have 100 strings with the same prefix, you are not
> guaranteed anything if you get only 10 of them.
> It will allow ZRANGEBYSCORE filtering, but with strings that's not a common
> need.
It depends on what your members are. Commonly, people will use scores
of 0, then as the member use the string they want to sort on, followed
by some sort of null+id suffix (as necessary). By picking a score that
is constructed from the string -> score, you get primary sorting on
the score (which will actually be faster than character-by-character
strcmp() calls), and secondary sorting on the member. You can trim the
member of the 6 character prefix, so you aren't re-performing the same
operation.
> So for now I just used python hashes trimmed to 48 bit on the whole string,
> just to get an exact match on a string using ZRANGEBYSCORE.
> not different than storing a set for each string with object keys, but much
> simpler and cheaper to update if an attribute's value changes.
Python's hashes are not necessarily unique. And depending on your
command-line arguments, hash('foo') != hash('foo') on subsequent runs
(
http://bugs.python.org/issue13703 which addresses the same security
issue that Redis addressed in 2.6).
With FP doubles, you can actually use 53 bits without issue with
integers (52 generally, but 53 for integers because the leading 1 is
assumed except in the case of denormalized values).
- Josiah