blocked by expire: fix or not?

182 views
Skip to first unread message

Salvatore Sanfilippo

unread,
May 11, 2012, 12:10:50 PM5/11/12
to Redis DB
Hello all,

today I updated the latency documentation with a new latency source:
tons of keys expiring at the same moment.

Please see http://redis.io/topics/latency section "Latency generated
by expires", where the issue is explained in details.

So the issue is, when this happens Redis currently will block
everything and expire keys in a busy loop til the number of already
expired keys (in the total of keys with an expire) became 25% or less.
25% is considered an acceptable memory waste to continue serving
queries again, expiring the remaining keys 100 per second,
incrementally.

The current behavior is:

1) Good because it reclaims memory ASAP.
2) Bad because it blocks the server.

However thee is a way to modify it. Given that the expire cycle is ran
every 100 milliseconds, we could say Redis, for instance, that even
when there is to expire more and more, it does not use more than,
let's say, 25 milliseconds of busy looping, and use the remaining 75
ms to serve clients. This way it will take 4 times the time needed
formerly in order to reclaim the memory, but in the meantime it will
be able to serve clients.

The current behavior is more simple and deterministic, but the
modified behavior may be preferible. I would like to start a
discussion about this with the interested parties in order to see what
we can do.

In the meantime, this is how to reproduce the problem in Redis >= 2.6-rc3

Start a spare Redis instance, then fill it with:

$ redis-benchmark -P 32 -q -r 1000000 -n 1000000 set k:rand:000000000000 value
set k:rand:000000000000 value: 275482.09 requests per second
$ redis-cli dbsize
(integer) 632486

Set an expire to all this keys, in the exact same moment:

$ tclsh
% expr [clock milliseconds]+60000
1336752485417
% exit
$ redis-benchmark -P 32 -q -r 1000000 -n 1000000 pexpireat
k:rand:000000000000 1336752485417
pexpireat k:rand:000000000000 1336752485417: 257997.94 requests per second

Now run redis-cli --latency and wait up to 60 seconds:

min: 0, max: 1964, avg: 0.50 (15850 samples)

The server blocked for almost 2 seconds.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org

Beauty is more important in computing than anywhere else in technology
because software is so complicated. Beauty is the ultimate defence
against complexity.
       — David Gelernter

Salvatore Sanfilippo

unread,
May 11, 2012, 12:17:50 PM5/11/12
to Redis DB
Oh well you can fill the instance simply with:

redis-benchmark -P 32 -q -r 1000000 -n 1000000 setex
k:rand:000000000000 20 value

The problem is bad enough that you don't need millisecond resolution
to block it.

Salvatore

Alexander Gladysh

unread,
May 11, 2012, 12:20:08 PM5/11/12
to redi...@googlegroups.com
Make that configurable? IMHO, both scenarios (speed preference, and
memory preference) are viable in different settings.

My 2c,
Alexander.

Salvatore Sanfilippo

unread,
May 11, 2012, 12:24:55 PM5/11/12
to redi...@googlegroups.com
On Fri, May 11, 2012 at 6:20 PM, Alexander Gladysh <agla...@gmail.com> wrote:

> Make that configurable? IMHO, both scenarios (speed preference, and
> memory preference) are viable in different settings.

I think that's "internals" enough that there should be a single
compromise developer-picked. It's too complex and
implementation-detail to let the user pick the right setting.

Thanks,

bugant

unread,
May 11, 2012, 12:27:39 PM5/11/12
to redi...@googlegroups.com
On Fri, May 11, 2012 at 6:24 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> I think that's "internals" enough that there should be a single
> compromise developer-picked. It's too complex and
> implementation-detail to let the user pick the right setting.

I do agree. And moreover, I think the fix you proposed would be the
best to have: memory will be released slower but will be released
anyway and in the meantime redis will continue serve requests (which
is a key factor to me)

matteo.

Alexander Gladysh

unread,
May 11, 2012, 12:29:58 PM5/11/12
to redi...@googlegroups.com
On Fri, May 11, 2012 at 8:24 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> On Fri, May 11, 2012 at 6:20 PM, Alexander Gladysh <agla...@gmail.com> wrote:
>
>> Make that configurable? IMHO, both scenarios (speed preference, and
>> memory preference) are viable in different settings.
>
> I think that's "internals" enough that there should be a single
> compromise developer-picked. It's too complex and
> implementation-detail to let the user pick the right setting.

In that case I'm for changing implementation to limit max time spent
in key collection. Otherwise, as I understand, there would not be any
way to guarantee Redis responsiveness.

Alexander.

Salvatore Sanfilippo

unread,
May 11, 2012, 12:33:54 PM5/11/12
to redi...@googlegroups.com
On Fri, May 11, 2012 at 6:29 PM, Alexander Gladysh <agla...@gmail.com> wrote:
> In that case I'm for changing implementation to limit max time spent
> in key collection. Otherwise, as I understand, there would not be any
> way to guarantee Redis responsiveness.

Excatly there is no other way to guarantee that Redis will be
responsive otherwise.
Still it will be unresponsive from a given point of view: many queries
will start showing a latency of 25 milliseconds or less, and total
ability to reply to queries will be limited to 75% of normal.
But it's still much better than nothing IMHO.

Salvatore Sanfilippo

unread,
May 11, 2012, 12:46:33 PM5/11/12
to Redis DB
Just tried this patch, does a huge difference in behavior. It seems
like the way to go (needs some minor reworking like to tune that 0xff
and turn 25 into a define).

diff --git a/src/redis.c b/src/redis.c
index d4d91f1..3b75f34 100644
--- a/src/redis.c
+++ b/src/redis.c
@@ -622,9 +622,10 @@ void updateDictResizePolicy(void) {
* keys that can be removed from the keyspace. */
void activeExpireCycle(void) {
int j;
+ long long start = mstime();

for (j = 0; j < server.dbnum; j++) {
- int expired;
+ int expired, iteration = 0;
redisDb *db = server.db+j;

/* Continue to expire if at the end of the cycle more than 25%
@@ -653,6 +654,10 @@ void activeExpireCycle(void) {
server.stat_expiredkeys++;
}
}
+ /* From time to time check if we ran out of time, and return to the
+ * caller if we already used more than 25 milliseconds of time. */
+ iteration++;
+ if ((iteration & 0xff) == 0 && (mstime()-start) > 25) return;
} while (expired > REDIS_EXPIRELOOKUPS_PER_CRON/4);

Alexander Gladysh

unread,
May 11, 2012, 12:50:13 PM5/11/12
to redi...@googlegroups.com
On Fri, May 11, 2012 at 8:46 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> Just tried this patch, does a huge difference in behavior. It seems
> like the way to go (needs some minor reworking like to tune that 0xff
> and turn 25 into a define).
>
> diff --git a/src/redis.c b/src/redis.c
> index d4d91f1..3b75f34 100644
> --- a/src/redis.c
> +++ b/src/redis.c
> @@ -622,9 +622,10 @@ void updateDictResizePolicy(void) {
>  * keys that can be removed from the keyspace. */
>  void activeExpireCycle(void) {
>     int j;
> +    long long start = mstime();
>
>     for (j = 0; j < server.dbnum; j++) {
> -        int expired;
> +        int expired, iteration = 0;
>         redisDb *db = server.db+j;
>
>         /* Continue to expire if at the end of the cycle more than 25%
> @@ -653,6 +654,10 @@ void activeExpireCycle(void) {
>                     server.stat_expiredkeys++;
>                 }
>             }
> +            /* From time to time check if we ran out of time, and return to the
> +             * caller if we already used more than 25 milliseconds of time. */
> +            iteration++;
> +            if ((iteration & 0xff) == 0 && (mstime()-start) > 25) return;

Is there a point in not writing more readable (iteration % 255) == 0 here?

>         } while (expired > REDIS_EXPIRELOOKUPS_PER_CRON/4);
>     }
>  }

Alexander.

Salvatore Sanfilippo

unread,
May 11, 2012, 12:52:09 PM5/11/12
to redi...@googlegroups.com
On Fri, May 11, 2012 at 6:50 PM, Alexander Gladysh <agla...@gmail.com> wrote:

> Is there a point in not writing more readable (iteration % 255) == 0 here?

Just that you have to trust the compiler turning it into & 0xff,
otherwise it's much slower ;)

But I bet that the one that does not understand "& 0xff" does not
understand "% 255" either.

Alexander Gladysh

unread,
May 11, 2012, 12:58:30 PM5/11/12
to redi...@googlegroups.com
On Fri, May 11, 2012 at 8:52 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> On Fri, May 11, 2012 at 6:50 PM, Alexander Gladysh <agla...@gmail.com> wrote:
>
>> Is there a point in not writing more readable (iteration % 255) == 0 here?
>
> Just that you have to trust the compiler turning it into & 0xff,
> otherwise it's much slower ;)
>
> But I bet that the one that does not understand "& 0xff" does not
> understand "% 255" either.

The point is not in understanding, but in amount of brainpower
required to figure out what is going on. But that's nitpicking anyway,
so nevermind.

Alexander.

Salvatore Sanfilippo

unread,
May 11, 2012, 1:00:57 PM5/11/12
to redi...@googlegroups.com
On Fri, May 11, 2012 at 6:58 PM, Alexander Gladysh <agla...@gmail.com> wrote:
> The point is not in understanding, but in amount of brainpower
> required to figure out what is going on. But that's nitpicking anyway,
> so nevermind.

I'll make sure to add a comment that makes it more clear indeed, thank you.

Josiah Carlson

unread,
May 11, 2012, 3:21:53 PM5/11/12
to redi...@googlegroups.com
Is mstime() expensive compared to everything else that is going on in
the loop? If not, check it every time. Otherwise, looks good to me.
(I'm also a fan of using & (power of 2 - 1) for these kinds of things
- in Python we can't rely on the compiler to take % 256 and convert it
into & 255).

Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
Message has been deleted

alex

unread,
May 12, 2012, 12:46:38 AM5/12/12
to redi...@googlegroups.com
mstime() is expensive, it will call the gettimeofday system call

在 2012年5月12日星期六UTC+8上午3时21分53秒,Josiah Carlson写道:
> To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com.

Salvatore Sanfilippo

unread,
May 12, 2012, 8:08:48 AM5/12/12
to redi...@googlegroups.com
On Sat, May 12, 2012 at 6:45 AM, alex <alex....@gmail.com> wrote:

> Actually mstime() is expensive, it will call the gettimeofday() system call

Exactly, it's a pretty fast syscall (as in, you can call it many
million times per second), but not fast enough to be called in a busy
loop like that, even since it does not make sense to check more often
than a few times every millisecond.

Out current mask is 0xff, but this needs to be multiplied by 10
lookups we perform every cycle.
So currently we are calling mstime() once every 2560 key lookups,
that's fair enough. It's still probably called too often, but not
enough to create a performance issue.

Salvatore Sanfilippo

unread,
May 12, 2012, 8:14:58 AM5/12/12
to redi...@googlegroups.com
On Fri, May 11, 2012 at 9:21 PM, Josiah Carlson
<josiah....@gmail.com> wrote:

> in Python we can't rely on the compiler to take % 256 and convert it
> into & 255

GCC outputs exactly the same machine code for & 0xff and % 256, it
does this since ages I think, I can remember this stuff was already
optimized at least 17 years ago, when I learned C for the first time
and this was one of the things I tried when I discovered "gcc -S".

Jokea

unread,
May 12, 2012, 10:05:30 AM5/12/12
to redi...@googlegroups.com
I think the answer is pretty clear.
Most people use redis because it has:
1. rich data structures, and,
2. high performance.
Blocking of redis is something we always trying to avoid.


Reply all
Reply to author
Forward
0 new messages