Massive delete vs massive expire

3,149 views
Skip to first unread message

Aníbal Rojas

unread,
May 26, 2010, 11:06:41 AM5/26/10
to redi...@googlegroups.com

Hello,

     We are running a large Redis instance, and we want to purge a large set of keys from it. Does expiring the keys for the current time supposes any benefit from deleting them immediatly?

    The number of keys is pretty large, and we don't want to block the server.

     Thanks in advance.

--
Aníbal Rojas

Salvatore Sanfilippo

unread,
May 26, 2010, 11:16:17 AM5/26/10
to redi...@googlegroups.com
2010/5/26 Aníbal Rojas <aniba...@gmail.com>:

Hello Anibal,

I think in your specific case the best way to do is to issue DEL
commands if you know allt he names of the keys beforehand. Just make
sure to call DEL, for instance, 5000 times / sec at max or alike, in
order to avoid using too much bandhwidth for this task while the
server is running other operations.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Aníbal Rojas

unread,
May 26, 2010, 1:36:44 PM5/26/10
to redi...@googlegroups.com

Salvatore,

    Thanks for your response, actually we don't lnow all the keys, but we can generate all the patterns. Using a KEYS command will be faster for sure, but I am not sure if it will block completely the redis instance while executing.

    And yes, I know that KEYS command is evil, and such. But as Redis usage grows, and keysets grow. A enhanced KEYS command  will be required as a means of executing some instrospection heavy tasks.

On May 26, 2010 10:46 AM, "Salvatore Sanfilippo" <ant...@gmail.com> wrote:

2010/5/26 Aníbal Rojas <aniba...@gmail.com>:

> Hello,
>
>      We are running a large Redis instance, and we want to purge a large set

> of keys ...

Hello Anibal,

I think in your specific case the best way to do is to issue DEL
commands if you know allt he names of the keys beforehand. Just make
sure to call DEL, for instance, 5000 times / sec at max or alike, in
order to avoid using too much bandhwidth for this task while the
server is running other operations.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Marc Byrd

unread,
May 26, 2010, 3:20:32 PM5/26/10
to redi...@googlegroups.com
This brings up my use case for databases (e.g. 0,1,2,3) - one can simply use flush if all such keys are in a single database.  That would surely be much faster than any KEYS * or piecewise deleting.  

Is the database notion going to be deprecated?  I understood at the redis meetup last night that it's use is discouraged...  [The only other utility for it, which may be considerably valuable, is to be able to use KEYS across a much smaller scope than the corresponding namespace-limited KEYS FOO*].  

The only other constructive advice I have is that if you do know the names of the keys, then pipelining the deletes would certainly be faster (and lower bandwidth).

Cheers,


m


2010/5/26 Aníbal Rojas <aniba...@gmail.com>

Aníbal Rojas

unread,
May 26, 2010, 5:13:03 PM5/26/10
to redi...@googlegroups.com

Marc,

    We are pipelining, and also we are tracking what should we delete. This is a special case, we changed a key pattern and we want to get rid of an old set of keys.

    This kind of situations will always happen, and a enhnaced* version of KEYS (yes I know this is an anti pattern) would be pretty useful.

    When I say enhanced I mean something that allows me to gently walk Redis's keyspace space so I can perform maintenance tasks without impacting the instance performance.

On May 26, 2010 2:50 PM, "Marc Byrd" <dr.mar...@gmail.com> wrote:

This brings up my use case for databases (e.g. 0,1,2,3) - one can simply use flush if all such keys are in a single database.  That would surely be much faster than any KEYS * or piecewise deleting.  

Is the database notion going to be deprecated?  I understood at the redis meetup last night that it's use is discouraged...  [The only other utility for it, which may be considerably valuable, is to be able to use KEYS across a much smaller scope than the corresponding namespace-limited KEYS FOO*].  

The only other constructive advice I have is that if you do know the names of the keys, then pipelining the deletes would certainly be faster (and lower bandwidth).

Cheers,


m



2010/5/26 Aníbal Rojas <aniba...@gmail.com>

>
> Salvatore,
>
>     Thanks for your response, actually we don't lnow all the keys, but we can gen...

--

> You received this message because you are subscribed to the Google Groups "Redis DB" group.

> To p...


--

You received this message because you are subscribed to the Google Groups "Redis DB" group.

To post ...

Salvatore Sanfilippo

unread,
May 27, 2010, 5:00:53 AM5/27/10
to redi...@googlegroups.com
On Wed, May 26, 2010 at 9:20 PM, Marc Byrd <dr.mar...@gmail.com> wrote:
> This brings up my use case for databases (e.g. 0,1,2,3) - one can simply use
> flush if all such keys are in a single database.  That would surely be much
> faster than any KEYS * or piecewise deleting.
> Is the database notion going to be deprecated?  I understood at the redis
> meetup last night that it's use is discouraged...  [The only other utility
> for it, which may be considerably valuable, is to be able to use KEYS across
> a much smaller scope than the corresponding namespace-limited KEYS FOO*].
> The only other constructive advice I have is that if you do know the names
> of the keys, then pipelining the deletes would certainly be faster (and
> lower bandwidth).

Hello Marc,

it's very unlikely that this will be deprecated in short time, as it's
too much time the feature is inside and there are many people using it
at this point. It's just discouraged when the idea is running
different applications in different DBs, or when you plan using
redis-cluster in the future when it will be available.

Your use case seems like a legitimate one indeed: you put part of your
data into a separated DB, and FLUSHDB it when you need to work with a
new set of data.

Maybe different databases in the same instance was not the most clean
design idea in Redis, but it's not going to disappear overnight, so
feel free to use it. But take in mind that it is possible that
redis-cluster will not support multiple DBs and SELECT.

Salvatore Sanfilippo

unread,
May 27, 2010, 5:04:29 AM5/27/10
to redi...@googlegroups.com
2010/5/26 Aníbal Rojas <aniba...@gmail.com>:

> Salvatore,
>
>     Thanks for your response, actually we don't lnow all the keys, but we
> can generate all the patterns. Using a KEYS command will be faster for sure,
> but I am not sure if it will block completely the redis instance while
> executing.
>
>     And yes, I know that KEYS command is evil, and such. But as Redis usage
> grows, and keysets grow. A enhanced KEYS command  will be required as a
> means of executing some instrospection heavy tasks.

Hello Anibal,

I think there are much better solutions for your problem compared to a
O(log(N)+M) KEYS implementation (there is no way to make it more
faster, btw N is the number of elements and M the matching range). If
the matching range is big you are still too slow.

So what's the solution? There are more then one.

1) Use a set where you add all this keys you may want to remove later.
Then fire a worker that uses SRANDMEMBER + DEL + SREM to remove all
the keys, in O(N) time (so O(1) for every key, the best you can get).
2) Use a sorted set to index all your "interesting" (subject to mass
removal) key space, and use ZRANGEBYSCORE to get an appropriate range
for deletion with DEL.

Aníbal Rojas

unread,
May 27, 2010, 7:45:32 AM5/27/10
to redi...@googlegroups.com

Salvatore,

    Actually, yes I know that using KEYS as a kind of SELECT in Redis is an antipattern.

     We are tracking our own keyspace in order to be able to purge old data, we are using a few zsets for this.

    BUT something I can assure you is "sh*t happens", is the way the world works.  A simple mistake, a script that breaks without completing its execution, etc can lend to a polluuted keyspace with dangling keys that can't be accesed.

    It is not about KEYS being faster, Big O calc doesn't lie. It is about options of exploring the whole keyspace without breaking the redis instace, and or the client.

    A LIMIT or RANGE option for KEYS is a natural starting point for this discussion, but what about a SLOW option, I am sure it sounds like crazy for the the Autobahn of the KV store, but a highway also requires maintenance.

   What about KEYS with a PUB <key> option that slowly/gently crawls the keyspace publishing the results for a SUBScribed consumer? (Maybe interesting in a more general way)

   My point is not about encouraging antipatterns in Redis, but having tools for handling the unexpected, with a kind introspection ability.

    Best regards,

--
Aníbal

On May 27, 2010 4:37 AM, "Salvatore Sanfilippo" <ant...@gmail.com> wrote:

2010/5/26 Aníbal Rojas <aniba...@gmail.com>:

> Salvatore,
>
>     Thanks for your response, actually we don't lnow all the keys, but we

> can gen...

Hello Anibal,

I think there are much better solutions for your problem compared to a
O(log(N)+M) KEYS implementation (there is no way to make it more
faster, btw N is the number of elements and M the matching range). If
the matching range is big you are still too slow.

So what's the solution? There are more then one.

1) Use a set where you add all this keys you may want to remove later.
Then fire a worker that uses SRANDMEMBER + DEL + SREM to remove all
the keys, in O(N) time (so O(1) for every key, the best you can get).
2) Use a sorted set to index all your "interesting" (subject to mass
removal) key space, and use ZRANGEBYSCORE to get an appropriate range
for deletion with DEL.


Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something t...

Salvatore Sanfilippo

unread,
May 27, 2010, 9:06:52 AM5/27/10
to redi...@googlegroups.com
2010/5/27 Aníbal Rojas <aniba...@gmail.com>:

>    My point is not about encouraging antipatterns in Redis, but having tools
> for handling the unexpected, with a kind introspection ability.

Ok, now I got your point: you mean, even when everything is designed
the right way, real world is much more complex than this and sometimes
it's possible to introspect databases that are in production, without
delays, and so forth.

I think I've an idea...

in this kind of applications maybe it's not needed to return elements
in a "point in time" fashion. What I mean is, probably it's ok if I
can call a command that will return a key after the other using a
"cursor", even if the database is subject to writes I can miss some
key that was added in the meantime (or rehashed for some other
reason).

For instance:

DEBUG KEYINDEX 0 => ["foo",100]
DEBUG KEYINDEX 100 => ["bar", 104]
...
DEBUG KEYINDEX 102303224 => nil

and so forth. The idea is that the user may ask for the next key
starting at a given index inside the hash table. The command returns
the first key found starting at index, and the next index to use in
the next DEBUG KEYINDEX command.

When there are no longe keys, a nil bulk reply is returned.

Do you think this is able to fix this kind of problems? I like it
because: it's very fast and generic, does not block the DB, it is up
to the client to do the slow things (like pattern matching against the
key).

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,

Mike Shaver

unread,
May 27, 2010, 9:10:59 AM5/27/10
to redi...@googlegroups.com
On Thu, May 27, 2010 at 9:06 AM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> Do you think this is able to fix this kind of problems? I like it
> because: it's very fast and generic, does not block the DB, it is up
> to the client to do the slow things (like pattern matching against the
> key).

What about exposing a synthetic set, like %REDIS%KEYS? %REDIS%INFO
could be a hash with the current details in it too; would reduce API
surface, at the cost of reserving a key prefix. You get all the
SINTER and such goodness, probably pretty efficiently, which would
capture a lot of these reflection needs I expect.

Mike

Aníbal Rojas

unread,
May 27, 2010, 10:04:21 AM5/27/10
to redi...@googlegroups.com


If I like it? I am sawing my left hand and sending it to you via DHL ;)

Actually, _any_ production grade Redis instance, needs it.

On May 27, 2010 8:37 AM, "Salvatore Sanfilippo" <ant...@gmail.com> wrote:

2010/5/27 Aníbal Rojas <aniba...@gmail.com>:


>    My point is not about encouraging antipatterns in Redis, but having tools

> for handling the u...

you’re always going to get a pop cu...

Aníbal Rojas

unread,
May 27, 2010, 11:22:45 AM5/27/10
to redi...@googlegroups.com


Interesting idea, something that we have discussed in the past in the list is the footprint of the API.

On May 27, 2010 8:41 AM, "Mike Shaver" <mike....@gmail.com> wrote:

On Thu, May 27, 2010 at 9:06 AM, Salvatore Sanfilippo <ant...@gmail.com> wrote:

> Do you think this...

What about exposing a synthetic set, like %REDIS%KEYS?  %REDIS%INFO
could be a hash with the current details in it too; would reduce API
surface, at the cost of reserving a key prefix.  You get all the
SINTER and such goodness, probably pretty efficiently, which would
capture a lot of these reflection needs I expect.

Mike

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.

To ...

Aníbal Rojas

unread,
May 27, 2010, 11:54:59 AM5/27/10
to redi...@googlegroups.com

Salvatore,

    What about:

    DEBUG KEYS start pattern count limit

    The pattern is a must, usually you want to debug for certain patterns.

    count is the numbers of keys to traverse, wheter they match or not. If this number is reached without a match only the next starting point is returnet.

    And limit is the maximun number of keys to return for positive matches.

    This is the Swiss Army Knife of keyspace debugging. With fallback to your proposal with DEBUG KEYS * 1 1

On May 27, 2010 8:37 AM, "Salvatore Sanfilippo" <ant...@gmail.com> wrote:

2010/5/27 Aníbal Rojas <aniba...@gmail.com>:


>    My point is not about encouraging antipatterns in Redis, but having tools

> for handling the u...

you’re always going to get a pop cu...

Reply all
Reply to author
Forward
0 new messages