Cascade deletes in Redis

599 views
Skip to first unread message

teleo

unread,
Jul 10, 2010, 4:52:12 PM7/10/10
to Redis DB
In practice, data of the same 'object' in Redis is often stored in
multiple keys, to optimize for various retrieval scenarios. For
example, for the top-level key 'users:john' there could be multiple
secondary keys such as 'users:john:activity'.

This raises a concern about 'garbage' data accumulating over time, due
to incomplete deletion (e.g. due to exceptions in the process that
calls Redis).

How do you address this problem today?

The problem can perhaps be handled semi-automatically for cases with
top-level keys like the example with 'users:john'. In such cases, it
would be useful to have the secondary keys expire automatically when
the top-level key expires. For example, this could be expressed via
the following hypothetical syntax:

EXPIRE users:john:activity WITH users:john

This is similar to (though less powerful than) cascaded deletes in SQL
databases.

What do you think?

Tobias Petry

unread,
Jul 11, 2010, 6:41:00 AM7/11/10
to Redis DB
Are hashmaps for the top level key user:john a practicable solution?
If not i can think of emulating expire: keep a deletion queue with
zsets and check every x seconds if an entry has to be deleted. Then
you take the entry and the value is pointing to a hashtable grouping
all items together that should get deleted:

SET users:john foobar
SET users:john:activity foobar

HMSET items_john users:john users:john:activity
ZADD deletion_queue <unixtimestamp> items_john

zrangebyscore deletion_queue 0 <unixtimestamp>

Josiah Carlson

unread,
Jul 11, 2010, 3:21:08 PM7/11/10
to redi...@googlegroups.com
I've done this before, but usually a little bit different...

Instead of storing (for example) 'users:john', I store 'users:john:',
notice the trailing colon. Whenever I need to delete a set of items,
I throw the 'to delete' root key (in this case 'users:john:') into a
list. When deleting, I either generate the list of keys from known
suffixes of the root key, or in some cases, just use KEYS with a
wildcard, and pass that to delete.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

Aníbal Rojas

unread,
Jul 11, 2010, 3:33:19 PM7/11/10
to redi...@googlegroups.com
Josiah,

> Instead of storing (for example) 'users:john', I store 'users:john:',
> notice the trailing colon.  Whenever I need to delete a set of items,
> I throw the 'to delete' root key (in this case 'users:john:') into a
> list.  When deleting, I either generate the list of keys from known
> suffixes of the root key, or in some cases, just use KEYS with a
> wildcard, and pass that to delete.

Using KEYS pattern in such fashion can lend to performance issues,
remember that it will put Redis to walk the *whole* keyspace.

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Josiah Carlson

unread,
Jul 11, 2010, 3:46:40 PM7/11/10
to redi...@googlegroups.com
I know, which is why I include "in some cases". For me, it depends on
the number of keys and whether or not I have recently removed known
attributes. Far more often I just use hashes, which is then just 1
delete (and I can pull in all of the data in a pre-structured way
using HGETALL).

- Josiah

2010/7/11 Aníbal Rojas <aniba...@gmail.com>:

Damian Janowski

unread,
Jul 11, 2010, 9:34:41 PM7/11/10
to redi...@googlegroups.com
On Sat, Jul 10, 2010 at 5:52 PM, teleo <lev....@gmail.com> wrote:
> In practice, data of the same 'object' in Redis is often stored in
> multiple keys, to optimize for various retrieval scenarios. For
> example, for the top-level key 'users:john' there could be multiple
> secondary keys such as 'users:john:activity'.
>
> This raises a concern about 'garbage' data accumulating over time, due
> to incomplete deletion (e.g. due to exceptions in the process that
> calls Redis).

Since you say that the problem is about clients that don't finish
their task, I think you'll agree that performing the necessary deletes
is not Redis' concern but your application's.

If you want to make sure your application finishes a group of
operations, use MULTI/EXEC:
http://code.google.com/p/redis/wiki/MultiExecCommand. If your client
dies while running these commands, they won't be executed at all, so
you won't end up with inconsistent data.

teleo

unread,
Jul 12, 2010, 7:58:49 PM7/12/10
to Redis DB
Thanks to everyone for answering.

What kind of isolation does MULTI-EXEC provide? Do other clients see
data not 'committed' with EXEC?

On Jul 12, 4:34 am, Damian Janowski <djanow...@dimaion.com> wrote:

Damian Janowski

unread,
Jul 12, 2010, 9:07:22 PM7/12/10
to redi...@googlegroups.com
On Mon, Jul 12, 2010 at 8:58 PM, teleo <lev....@gmail.com> wrote:
> Thanks to everyone for answering.
>
> What kind of isolation does MULTI-EXEC provide? Do other clients see
> data not 'committed' with EXEC?

No, when you run MULTI, all subsequent commands are queued on the
server. Only when you run EXEC do those commands actually execute.
Makes sense?

Mason Jones

unread,
Jul 12, 2010, 11:32:33 PM7/12/10
to redi...@googlegroups.com

Which does essentially provide the same functionality, in that no
other clients will see your updates, because the commands will be
executed in "isolation" because Redis is single-threaded. So it
depends on your definition of isolation, I suppose, but should most
likely give you what you need.

teleo

unread,
Jul 13, 2010, 5:46:54 AM7/13/10
to Redis DB
Thanks, everybody.

I need to clarify one thing, though. The fact that Redis is single-
threaded does not *on its own* mean that requests from other clients
are not executed. Some event-driven implementations implement a kind
of cooperative multi-tasking, so that other clients are not blocked
for too long.

My question is: when the queued commands are finally EXECuted is it
guaranteed that no clients sees any intermediate state -- any state
after EXEC started but before it ended?

If so, there is good isolation. The down side, however, is that very
large MULTI-EXECs could cause throughput problems.


On Jul 13, 6:32 am, Mason Jones <masono...@gmail.com> wrote:
> On Mon, Jul 12, 2010 at 6:07 PM, Damian Janowski <djanow...@dimaion.com> wrote:

Aníbal Rojas

unread,
Jul 13, 2010, 8:26:47 AM7/13/10
to redi...@googlegroups.com


You can actually have any number of concurrent connections to Redis, but being single threaded means that no "read' command will be executed while your multi/exec is being executed. And yes, _any_ slow command execution can severy affect performance queuein all of the subsequent commands until it finishes.

On Jul 13, 2010 5:16 AM, "teleo" <lev....@gmail.com> wrote:

Thanks, everybody.

I need to clarify one thing, though. The fact that Redis is single-
threaded does not *on its own* mean that requests from other clients
are not executed. Some event-driven implementations implement a kind
of cooperative multi-tasking, so that other clients are not blocked
for too long.

My question is: when the queued commands are finally EXECuted is it
guaranteed that no clients sees any intermediate state -- any state
after EXEC started but before it ended?

If so, there is good isolation. The down side, however, is that very
large MULTI-EXECs could cause throughput problems.


On Jul 13, 6:32 am, Mason Jones <masono...@gmail.com> wrote:

> On Mon, Jul 12, 2010 at 6:07 PM, Damian Janowski <djanow...@dimaion.com> wrote:

> > On Mon, Jul 12...

Reply all
Reply to author
Forward
0 new messages