On Wed, May 9, 2012 at 4:28 AM, Clifton Cunningham
<
clifton.c...@gmail.com> wrote:
> Hi, I've looked for other articles on this and seen some ideas, discussions
> around new features etc., but no solutions.
>
> We are using redis in production here for near real time counts of activity
> by minute, where in any given day we can get upwards of 50M keys created. I
> only want to keep data for 2 days in Redis, hence once I get to day 3 I need
> to delete 50M keys.
>
> I have tried:
>
> 1. Expires - doesn't actually delete the keys as they aren't accessed once
> over 2 days old, hence memory just keeps growing, eventually redis starts
> swapping and the whole thing dies. The background task that tries to delete
> them just can't keep up.
> 2. Set that contains the keys to deletes, then spop / del loop from client.
> I can only seem to delete about 200 / s, mostly due to the fact that I need
> a full network round trip for each SPOP and DEL.
> 3. An Eval script that does the SPOP and DEL on Redis, this blocks the
> whole thread and so the Redis stops.
As Salvatore already mentioned, expiration should work for you. As an
alternative...
Take your EVAL script, and limit it to X items per call. By paying
attention to the latency of the EVAL call, you can choose your latency
by adjusting X, and run the command repeatedly until it has no more
keys to delete.
Alternatively, you can instead use a ZSET with scores = 0, and have
your client fetch items in blocks of your choosing with ZRANGE and
ZREMRANGEBYRANK, deleting the items in bulk with the already-mentioned
variable arg length DEL command.
Something like...
def clear_keys(conn, zkey, latency):
block = 100
# use a pipeline without MULTI/EXEC
pipe = conn.pipeline(False)
def next():
pipe.zrange(zkey, 0, block)
pipe.zremrangebyrank(zkey, 0, block)
return pipe.execute()[-2]
# start the clock and fetch the first block
# of items
start = time.time()
to_delete = next()
while to_delete:
# adjust the block size to try to hit
# the desired latency
end = time.time()
if end-start < latency:
block <<= 1
else:
block = max(3 * block >> 2, 100)
# delete the items and fetch the next
# block
start = time.time()
pipe.delete(*to_delete)
to_delete = next()
The above could also be implemented trivially with EVAL in Lua, and
you could get even finer grained control over latency there.
Regards,
- Josiah