There have been a few proposals to offer the functionality you desire
(or something similar), but have thus far failed to gain momentum or a
patch. My guess is because when people come from a relational
database world, a "cursor" offers a view into a fixed time, where
Redis only guarantees that when one command is executing, there are no
other commands executing.
If you want, you can get very similar functionality by adding all of
your keys to a zset as they are inserted, which allows for very fast
slicing operations. You would have to decide on a consistent score,
and for handling how to paginate over the items in the zset so as to
not miss keys as they are potentially being inserted while you are
paginating over them. It may be a little tedious, but it's about the
best that Redis offers right now for what you want.
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
As you are adding items to all of your sets with SADD <key> <item>,
you can check ZSCORE keylist <key> . If the score returned is null,
then you can ZADD keylist <score> <key> . If you don't really care
about the order, then you can simply set score to a random number each
time you perform the ZADD. If you do care about the order, then you
can construct your score to be relevant to the order in which you want
the keys returned. To make this faster, you can do these kinds of
things in bulk. For example in Python...
import random
def add_items(sequence, rconn):
known = set()
# set up a pipeline to reduce round-trips to a minimum
p = rconn.pipeline(False)
for i, (key, item) in enumerate(sequence):
p.sadd(key, item)
known.add(key)
if i and not i%100:
p.execute()
map(p.zscore, len(known)*['keylist'], known)
for k, score in zip(known, p.execute()):
if score is None:
p.zadd('keylist', k, random.random())
p.execute()
When you want to paginate through your keys, you can simply perform
"ZRANGE keylist 0 <count-1>' to get the first page, 'ZRANGE keylist
<count> <count*2-1>' to get the second page, etc. In Python:
def get_keys(rconn, page, count=100):
return rconn.zrange('keylist', page*count, (page+1)*count - 1)
> We have a lot of keys. Is there a limitation on the number of keys in
> ZSET?
There is no hard limit, other than the amount of memory your system
has, and whether Redis was compiled for 32 or 64 bit.
- Josiah
ITERCREATE my_iter (returns true/false)
ITERGET my_iter (get current key in the iterator, perhaps also the value type)
ITERNEXT my_iter (advance by one, perhaps adding an atomic
ITERGETNEXT, getting and advancing in one go)
ITERREWIND my_iter
1. if you have a database of 15M keys (I do!), inserting them to a
ZSET will probably take a lot of time, and consume a lot of memory.
2. An iterator will react immediately to set/del done after it was
created, but a ZSET will not.
of course you can do double inserts and add every inserted key to the
keys set, but that will hurt performance.
It's important to remember, once again, that using KEYS is for
administrative/debugging/offline operations and it should not be used
as a strategy for laying out your application data.
Is this the kind of use you want to give to it?
If so, would it be possible to set up a slave on which you'd run the
slow command and perform your calculations?
In the case of a single writer, my code could be modified to use a
counter instead of a random score, at which point you can effectively
guarantee that previously unused keys will come later, with a small
race condition which may cause some keys to move later (but this can
be worked around with a little more code).
The choice of a zset, as opposed to a plain set or a list, was due to
the speed of accessing an arbitrary "page" in a zset: O(logn + k),
where n is the size of the zset, and k is the number of items to
return. With a list, it's O(m + k), where m is the offset into the
list, and k is the number of items to return, which is O(n) on average
for the entire list. And with a set, it's always O(n), because you
can't page through. There is a trick with using lists and
push/popping items to reduce fetch time, but it is a bit ugly.
- Josiah