I was thinking about data structures for an unrelated project
and I thought about something: the Redis keyspace is (if I
understand it correctly) a Hash. Yet it is a common practice
to store keys that look like this:
Something1:Something2:Something3
where Something1 will be extremely frequent, Something2 will
be a bit less frequent and Something3 will be the least
frequent.
Knowing that, would it not be a better idea to use something
like a critbit tree instead of a hash? It would save memory
by compressing the keys and it could be faster too. For
instance, Varnish uses critbit trees by default ([1]).
More information about these trees, which are basically
binary radix trees, can be found in the links below.
[1] http://kristianlyng.wordpress.com/2010/07/30/varnish-2-1-3-use-it/
[2] http://cr.yp.to/critbit.html
[3] http://www.imperialviolet.org/binary/critbit.pdf
--
Pierre 'catwell' Chapuis
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
>
>
On to Dvir's point about putting up a fork w/ a critbit overriding the
hash-table, this is not that much work to do. It basically comes down
to overriding the functions in dict.c w/ critbit functions. If done
correctly, even if it is not merged into the main branch, it could be
applied as a patch (dict.c does not change much - cause its SOLID
code), meaning it could be applied indefinitely (barring dict.c
changes) and wouldn't turn into bit-rot :)
> I was thinking of something more modest, that not really acts as a complete
> server, but rather just lets you run optimized KEYS queries against it.
> say you fire up such a slave, and tell it something like "INDEXKEYS user:*
> product:*" etc
> each new key that matches these patterns, is inserted into a critbit. later
> you can simply do something like "IKEYS user:a*" to query the cribit tree
> and just get keynames.
> Actually since the overhead in memory is controlled by the user (and is 0 by
> default), this might not be such a bad idea to add this to redis itself if
> enough people want it and the powers that be like it :)
> I know I'd use it.
I think that I would use it as well. That or nested hashes.
Alexander.
> I was thinking of something more modest, that not really acts as a
> complete server, but rather just lets you run optimized KEYS queries
> against it.
> say you fire up such a slave, and tell it something like "INDEXKEYS
> user:* product:*" etc
> each new key that matches these patterns, is inserted into a critbit.
> later you can simply do something like "IKEYS user:a*" to query the
> cribit tree and just get keynames.
> Actually since the overhead in memory is controlled by the user (and
> is 0 by default), this might not be such a bad idea to add this to
> redis itself if enough people want it and the powers that be like it
> :)
> I know Id use it.
This is an interesting idea but it actually *increases* memory use
instead of reducing it, which is one of my goals.
I use Redis to implement complex data structures such as inverted
indices and trees. A strategy to implement a tree in a flat keyspace
if only the leaves hold values is:
Node keys store an integer which is the number of their sons.
Nodes whose value is 0 are terminal and if their key is "x" then
there is a key called "v:x" that stores the value.
The root is the "1" key.
For instance (key -> value):
1 -> 3
1:1 -> 1
1:1:1 -> 0
v:1:1:1 -> "a"
1:2 -> 0
v:1:2 -> "b"
1:3 -> 2
1:3:1 -> 0
v:1:3:1 -> "c"
v:1:3:2 -> 1
v:1:3:2:1 -> "d"
This represents:
1
|- 1:1
|- "a"
|- 1:2
|- "b"
|- 1:3
|- 1:3:1
| |- "c"
|- 1:3:2
|- 1:3:2:1
|- "d"
If your nodes hold a value you can replace the integer with a set of
sub-components. For instance, imagine you want to store web pages:
example.com
|- stuff
|- x.html
|- content for "http://example.com/stuff/x.html"
|- y.html
|- content for "http://example.com/stuff/y.html"
|- about
|- content for "http://example.com/about"
The key example.com:stuff would be a set that contains "x.html"
and "y.html".
All these structures have in common that the keys are prefixed by
stacked namespaces. ORM do this, too, so it's very common.
With such structures, in real life, long keys with their first
80% or more in common are frequent, and critbit trees are
designed to store such data very efficiently, so what I'm saying
is that using them to replace hashes could halve the memory used
by keys.
Moreover, if we go back to the URLs example,most people don't
actually store the content of the page in the tree, they just
want to know that they have it on disk. In this case, the
cumulative length of the keys is on the same order than the
cumulative length of the values, which means that you do want
to reduce the memory usage of the keys.
This has become a very long post, but my point is: yes we would
benefit from what Dvir proposes, but it would not bring all the
advantages of critbit trees as an alternative to hashes.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
That's absolutely true, but there is a different issue. In diskstore,
in expires, and in other parts of the Redis core we do sampling to
obtain elements that are good candidates to expire. At this level we
perform a lot of lookups in a loop. In this case the O(1) vs O(logN)
will become a performance hint.
Also we have RANDOMKEY and other randomized operations. The random
sampling itself is based on the ability of getting random elements. To
do this with a tree we need to augment the tree, and use more memory,
and more CPU.
Final note: in a cluster scenario getting ranges of keys is not
possible anyway in a straightforward way.
Cheers,
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
> That's absolutely true, but there is a different issue. In diskstore,
> in expires, and in other parts of the Redis core we do sampling to
> obtain elements that are good candidates to expire. At this level we
> perform a lot of lookups in a loop. In this case the O(1) vs O(logN)
> will become a performance hint.
O(x) performance only means something relative to an operation.
Hash operations are actually O(N) (N being the length of the key)
because the hashing function itself is O(N).
> Also we have RANDOMKEY and other randomized operations. The random
> sampling itself is based on the ability of getting random elements.
> To
> do this with a tree we need to augment the tree, and use more memory,
> and more CPU.
That's definitely true.
> Final note: in a cluster scenario getting ranges of keys is not
> possible anyway in a straightforward way.
Not in a straightforward way, that's true. But it could theoretically
be made extremely efficient if Redis tries to regroup subtrees of the
critbit tree on the same instance... That could be a job for a
complementary critbit tree layer.
However it's true that I was only thinking about the single-instance
use case.
--
Pierre 'catwell' Chapuis
I was not clear, let's assume the key is constant size, so you can
optimize away this as a constant factor.
If I do M lookups for random sampling to evict keys on max memory, and
I've O(1) lookups, this is O(M)
If instead I've O(logN) lookups complexity, this is O(M*LogN)
complexity that can easily be an order of magnitude more, so I
planned, for instance, to block 1 millisecond at max in an entry level
server instead I see latency of queries from time to time in the order
of 10 milliseconds.
> If I do M lookups for random sampling to evict keys on max memory,
> and
> I've O(1) lookups, this is O(M)
I've had a look at that and it looks strange to me but probably I don't
understand the score well.
According to what I saw:
- Keys that can expire are stored in a hash: db->expires
In this hash, the keys are keys in the DB and the values are
expiration
times
- In activeExpireCycle, for each db, you get a fixed number of random
keys and for each of those you delete it if has expired, ie. if its
expiration time is later than "now"
- To get random keys in a hash you use dictGetRandomKey which doesn't
look like it's O(1). It even looks like its runtime is
non-deterministic because:
- first it selects a bucket with something like:
do {
h = random() & d->ht[0].sizemask;
he = d->ht[0].table[h];
} while(he == NULL);
This has a probabilistic complexity depending on the number of
non-empty buckets, no? However I don't know how how frequent
empty buckets are.
- then it gets a random element in a list, which is o(len(list)),
but I suspect that most lists have small lengths (otherwise the
bucket number changes).
What I don't understand is why you use such a hash to store keys
that will expire and not a sorted set with the expiration date as
the score. Then you wouldn't have to do random lookups at all and
you could simply expire the keys with the oldest expiration date...
--
Pierre 'catwell' Chapuis
> I don't understand the score well.
The code... Although score is a nice metaphor ;)
--
Pierre 'catwell' Chapuis
OK, I thought so. I checked the code and when you resize you
double the size of the bucket so if the best possible number
of operations is X, the average just after a resize is 2X
and there's always less that 0.2% chances that it does more
than 10 operations at this step.
Now that doesn't explain why you use a random algorithm instead
of a deterministic one with a sorted data structure.
Also, I don't really understand why you say that lookups would
be O(logN) in a critbit tree. They are not balanced trees and
any lookup for a key K is O(len(K)). Since you factor out len(K)
as a constant it would be O(1).
--
Pierre 'catwell' Chapuis
Not sure I understand the question, please can you add some detail?
> Also, I don't really understand why you say that lookups would
> be O(logN) in a critbit tree. They are not balanced trees and
> any lookup for a key K is O(len(K)). Since you factor out len(K)
> as a constant it would be O(1).
Now I see they are Radix Trees, so yes O(1) lookups (we can consider
the key fixed size for this matter).
Btw the point is, I don't consider range queries in the key space very
important nor a fundamental operation for data organization in Redis,
actually it is more or less an anti-pattern IMHO.
So I don't want to change a lot of code for no significant gain.
I'm much more interested in providing more range queries capabilities
to sorted sets, for instance in form of a command able to get a
lexicographic range, but since for now I consider the API to be more
or less pretty fixed, for now I'm not doing too much about it. We
really need to bring forward cluster, diskstore, memory footprint of
sorted sets, rdb save/load performances for now.
But I'll keep in mind there is this data structure if we'll ever need
some substitute, so thank you for pointing it out.
>> Now that doesn't explain why you use a random algorithm instead
>> of a deterministic one with a sorted data structure.
>
> Not sure I understand the question, please can you add some detail?
I was thinking you could make db->expires a zset (with score =
expiration
date) instead of a hash and simply expire the oldest keys. Then I
looked
at redis.io and saw ZRANGEBYSCORE is O(log(N)+M), so it would not be
more efficient. No silver bullet I guess.
> Btw the point is, I don't consider range queries in the key space
> very
> important nor a fundamental operation for data organization in Redis,
> actually it is more or less an anti-pattern IMHO.
> So I don't want to change a lot of code for no significant gain.
I can understand that. I wanted this more for the memory usage gains
than for range queries though.
> I'm much more interested in providing more range queries capabilities
> to sorted sets, for instance in form of a command able to get a
> lexicographic range, but since for now I consider the API to be more
> or less pretty fixed, for now I'm not doing too much about it. We
> really need to bring forward cluster, diskstore, memory footprint of
> sorted sets, rdb save/load performances for now.
Yes, those are real priorities. I'm probably going to need cluster
someday so I agree you should focus on it ;)
--
Pierre 'catwell' Chapuis
> Pierre wrote why not use a ZSET to store expire keys, so it would be
> guaranteed that the oldest (not random) keys were getting evicted and
> also walking a ZSET is most likely quicker than
> N*random_hash_table_lookups. The ZSET approach would require
> additional memory and add an additional data structure, but like
> Pierre, I have also asked myself why random expiration is better (and
> I am pretty sure I am missing something, there are tons of tradeoffs
> in this problem). BTW, what data-structure does memcached do to store
> eviction keys and expire them, I have no clue.
Oh ok, now I get it. The reply is very easy: our current solution has
zero additional memory used per key.
> Yeah, true. On a related note: I read your hackernews response to
> simonw on ZSET intersections across cluster nodes and was really
> impressed at the MIGRATE to temp intersection nodes solution. I think
> you really nailed that extremely difficult problem by abstracting it
> out of the server, brilliant :)
Thanks ;)
Cheers,
Salvatore
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
--
The problem is that we also evict things without an expire set using
LRU, so this solution can't work in the general case without using
additional memory. Given that, better to have a single approach for
all the cases.
Btw from randomized simulations I did, the current approach is pretty
similar to precise LRU in the long run.
> The problem is that we also evict things without an expire set using
> LRU, so this solution can't work in the general case without using
> additional memory. Given that, better to have a single approach for
> all the cases.
OK. The only problem with the current approach is that if you have
eg. 100k keys that expire in 2 days and 1 key that is actually
expired there is almost no chance that this key ever gets collected.
Btw the point is, I don't consider range queries in the key space very
important nor a fundamental operation for data organization in Redis,
actually it is more or less an anti-pattern IMHO.
Because a huge assumption on the Redis key-value model is that keys
are only accessed by fast lookups.
This makes it trivial to distribute in a completely horizontally
scalable fashion, without inter-node communication or other complex
merging.
If this is no longer true we have a different model.
It's a good idea to abuse the single-instance model, so I encourage your usage.
To design with the single instance in mind instead at this point is dangerous.