Redis crash with 'appendonly yes' and message queue using lpush/brpop

Matt Billenstein

unread,

Jun 28, 2011, 5:43:00 PM6/28/11

to redi...@googlegroups.com

Here are a couple crash logs -- the first I can reproduce at will on this particular server, the second happens occassionally. I'm not sure if appendonly is causing the issue, or changing the timing such that whatever bug I'm hitting here occurs, but removing appendonly I can't seem to reproduce either crash. Let me know if there is anything else I can help with -- rolling a testcase is proving to be fairly difficult -- I'm using lpush/brpop to implement a message queue and one of our tests is firing requests through our api service with multiple clients so it's hard to repro without reproducing our whole setup.

ubuntu@ip-10-160-23-4:/var/log/flingo$ uname -a

Linux ip-10-160-23-4 2.6.32-312-ec2 #24-Ubuntu SMP Fri Jan 7 18:31:34 UTC 2011 i686 GNU/Linux

ubuntu@ip-10-160-23-4:/var/log/flingo$ cat /opt/redis/conf/redis.conf

daemonize no

port 6379

logfile /var/log/flingo/redis.log

dir /data/redis

appendonly yes

# slaveof 10.209.199.15 6379

[7683] 28 Jun 21:33:48 # ======= Ooops! Redis 2.2.11 got signal: -11- =======

[7683] 28 Jun 21:33:48 # redis_version:2.2.11

redis_git_sha1:00000000

redis_git_dirty:0

arch_bits:32

multiplexing_api:epoll

process_id:7683

uptime_in_seconds:21

uptime_in_days:0

lru_clock:906258

used_cpu_sys:0.05

used_cpu_user:0.00

used_cpu_sys_childrens:0.00

used_cpu_user_childrens:0.00

connected_clients:2

connected_slaves:0

client_longest_output_list:0

client_biggest_input_buf:0

blocked_clients:5

used_memory:561540

used_memory_human:548.38K

used_memory_rss:1409024

mem_fragmentation_ratio:2.51

use_tcmalloc:0

loading:0

aof_enabled:1

changes_since_last_save:5316

bgsave_in_progress:0

last_save_time:1309296807

bgrewriteaof_in_progress:0

total_connections_received:8

total_commands_processed:10

expired_keys:0

evicted_keys:0

keyspace_hits:1

keyspace_misses:2734

hash_max_zipmap_entries:512

hash_max_zipmap_value:64

pubsub_channels:0

pubsub_patterns:0

vm_enabled:0

role:master

allocation_stats:4=45,6=1,8=2716,9=28,10=53,11=277,12=33369,13=37,14=2753,15=89,16=265,17=275,18=6,19=278,20=6,21=2429,22=3,23=

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(incrRefCount+0x6) [0x805f5d6]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(handleClientsWaitingListPush+0x55) [0x80684c5]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(pushGenericCommand+0x131) [0x8068cc1]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(lpushCommand+0x19) [0x8068d29]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(call+0x25) [0x8055685]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(processCommand+0x260) [0x80559d0]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(processInputBuffer+0x5f) [0x805e43f]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(readQueryFromClient+0x70) [0x805e530]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(aeProcessEvents+0x132) [0x8052102]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(aeMain+0x37) [0x8052317]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server(main+0x112) [0x80577f2]

[7683] 28 Jun 21:33:48 # /lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xe6) [0xb75a3bd6]

[7683] 28 Jun 21:33:48 # /opt/redis/bin/redis-server() [0x8051701]

[30296] 28 Jun 03:17:41 # ======= Ooops! Redis 2.2.11 got signal: -11- =======

[30296] 28 Jun 03:17:41 # redis_version:2.2.11^M

redis_git_sha1:00000000^M

redis_git_dirty:0^M

arch_bits:32^M

multiplexing_api:epoll^M

process_id:30296^M

uptime_in_seconds:16^M

uptime_in_days:0^M

lru_clock:899682^M

used_cpu_sys:0.04^M

used_cpu_user:0.00^M

used_cpu_sys_childrens:0.00^M

used_cpu_user_childrens:0.00^M

connected_clients:2^M

connected_slaves:0^M

client_longest_output_list:0^M

client_biggest_input_buf:0^M

blocked_clients:3^M

used_memory:561732^M

used_memory_human:548.57K^M

used_memory_rss:1413120^M

mem_fragmentation_ratio:2.52^M

use_tcmalloc:0^M

loading:0^M

aof_enabled:1^M

changes_since_last_save:4834^M

bgsave_in_progress:0^M

last_save_time:1309231045^M

bgrewriteaof_in_progress:0^M

total_connections_received:8^M

total_commands_processed:10^M

expired_keys:0^M

evicted_keys:0^M

keyspace_hits:1^M

keyspace_misses:2479^M

hash_max_zipmap_entries:512^M

hash_max_zipmap_value:64^M

pubsub_channels:0^M

pubsub_patterns:0^M

vm_enabled:0^M

role:master^M

allocation_stats:4=37,6=1,8=2460,9=28,10=31,11=215,12=31006,13=37,14=2499,15=67,16=211,17=214,18=6,19=217,20=6,21=2243,22=3,23

[30296] 28 Jun 03:17:41 # ./redis-server(incrRefCount+0x6) [0x805f5d6]

[30296] 28 Jun 03:17:41 # ./redis-server(handleClientsWaitingListPush+0x55) [0x80684c5]

[30296] 28 Jun 03:17:41 # ./redis-server(pushGenericCommand+0x131) [0x8068cc1]

[30296] 28 Jun 03:17:41 # ./redis-server(lpushCommand+0x19) [0x8068d29]

[30296] 28 Jun 03:17:41 # ./redis-server(call+0x25) [0x8055685]

[30296] 28 Jun 03:17:41 # ./redis-server(processCommand+0x260) [0x80559d0]

[30296] 28 Jun 03:17:41 # ./redis-server(processInputBuffer+0x5f) [0x805e43f]

[30296] 28 Jun 03:17:41 # ./redis-server(readQueryFromClient+0x70) [0x805e530]

[30296] 28 Jun 03:17:41 # ./redis-server(aeProcessEvents+0x132) [0x8052102]

[30296] 28 Jun 03:17:41 # ./redis-server(aeMain+0x37) [0x8052317]

[30296] 28 Jun 03:17:41 # ./redis-server(main+0x112) [0x80577f2]

[30296] 28 Jun 03:17:41 # /lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xe6) [0xb7733bd6]

[30296] 28 Jun 03:17:41 # ./redis-server() [0x8051701]

thx

m

Pieter Noordhuis

unread,

Jun 29, 2011, 3:18:05 AM6/29/11

to redi...@googlegroups.com

Hi Matt,

The stack traces are very helpful, thank you. I'm investigating the issue.

Cheers,
Pieter

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/Ei3MZy24CaMJ.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
>

Salvatore Sanfilippo

unread,

Jun 29, 2011, 3:20:41 AM6/29/11

to redi...@googlegroups.com

On Wed, Jun 29, 2011 at 9:18 AM, Pieter Noordhuis <pcnoo...@gmail.com> wrote:

> The stack traces are very helpful, thank you. I'm investigating the issue.

Hello Pieter, I was starting to take a look as well, this is probably
related to my fix for the BRPOPLPUSH replication bug but I'm not 100%
sure.

From the trace it looks like it is very probably the same bug btw,
with incrRefCount called against some already freed or otherwise
invalid object.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Matt Billenstein

unread,

Jun 29, 2011, 3:35:29 AM6/29/11

to redi...@googlegroups.com

Thanks much guys.

m

Salvatore Sanfilippo

unread,

Jun 29, 2011, 3:39:46 AM6/29/11

to redi...@googlegroups.com

Hey Matt, you are welcome, just a question: are you sure you are not
using BRPOPLPUSH but just BRPOP/BLPOP?

Thanks,
Salvatore

On Wed, Jun 29, 2011 at 9:35 AM, Matt Billenstein <mbil...@gmail.com> wrote:
> Thanks much guys.

> m
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/rhExKjKe6kwJ.

> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
>

--

Matt Billenstein

unread,

Jun 29, 2011, 3:48:28 AM6/29/11

to redi...@googlegroups.com

Yup, this code is pretty simple -- just a lpush / brpop queue basically.

m

Salvatore Sanfilippo

unread,

Jun 29, 2011, 3:51:04 AM6/29/11

to redi...@googlegroups.com

On Wed, Jun 29, 2011 at 9:48 AM, Matt Billenstein <mbil...@gmail.com> wrote:
> Yup, this code is pretty simple -- just a lpush / brpop queue basically.

Thank you, are you using a timeout of 0 (forever) or there is an
actual timeout? Thanks and sorry for all the questions but this helps!

Salvatore

Salvatore Sanfilippo

unread,

Jun 29, 2011, 3:59:46 AM6/29/11

to redi...@googlegroups.com

On Tue, Jun 28, 2011 at 11:43 PM, Matt Billenstein <mbil...@gmail.com> wrote:

> connected_clients:2
> connected_slaves:0
> client_longest_output_list:0
> client_biggest_input_buf:0
> blocked_clients:5

Pieter: note that...

connected_clients:2, blocked_clients:5

it is in the unregistering code I guess.
And this is probably why the client has a target key, since it is
simply a reused memory and that field is not null.

Cheers,

Matt Billenstein

unread,

Jun 29, 2011, 4:04:29 AM6/29/11

to redi...@googlegroups.com

Timeout is 60 seconds.

I worked my way back from 2.2.5 to 2.2.0 and I can reproduce a crash on every version -- so if it is indeed a bug, it's been there awhile.

m

Salvatore Sanfilippo

unread,

Jun 29, 2011, 4:17:12 AM6/29/11

to redi...@googlegroups.com

Very interesting, so it is not "new iterator" related, and still it
survived to the refactoring of this part of the code.

Since you are so gentle, please could you give a try to latest unstable branch?

Thanks,
Salvatore

>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/87Xh_04qgi8J.

> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
>

--

Matt Billenstein

unread,

Jun 29, 2011, 4:24:45 AM6/29/11

to redi...@googlegroups.com

Yup, here is the stack trace:

[13014] 29 Jun 08:23:26 # === ASSERTION FAILED ===

[13014] 29 Jun 08:23:26 # ==> t_list.c:787 'c->bpop.keys != NULL' is not true

[13014] 29 Jun 08:23:26 # (forcing SIGSEGV in order to print the stack trace)

[13014] 29 Jun 08:23:26 # ======= Ooops! Redis 2.9.0 got signal: -11- =======

[13014] 29 Jun 08:23:26 # # Server

redis_version:2.9.0

redis_git_sha1:d5b36c51

redis_git_dirty:0

arch_bits:32

multiplexing_api:epoll

process_id:13014

tcp_port:6379

uptime_in_seconds:28

uptime_in_days:0

lru_clock:910156

# Clients

connected_clients:2

client_longest_output_list:0

client_biggest_input_buf:0

blocked_clients:5

# Memory

used_memory:570140

used_memory_human:556.78K

used_memory_rss:1511424

used_memory_peak:569088

used_memory_peak_human:555.75K

mem_fragmentation_ratio:2.65

mem_allocator:jemalloc-2.2.1

# Allocstats

allocation_stats:4=45,6=1,8=2720,9=28,10=57,11=282,12=33433,13=38,14=2757,15=94,16=274,17=279,18=6,19=283,20=6,21=2429,22=3,23=6,24=316,25=3,26=2,27=2,28=2,29=1,30=22,31=94,32=15,34=1,36=7281,38=1,39=1,41=1,43=1,48=74,55=1,58=1,59=1,64=4,65=865,69=4,70=14,71=9,113=1,115=2,128=4,167=1,171=1,184=225,187=41,205=1,207=1,>=256=31

# Persistence

loading:0

aof_enabled:1

changes_since_last_save:5326

bgsave_in_progress:0

last_save_time:1309335778

bgrewriteaof_in_progress:0

aof_current_size:459645

aof_base_size:459263

aof_pending_rewrite:0

# Stats

total_connections_received:8

total_commands_processed:10

expired_keys:0

evicted_keys:0

keyspace_hits:1

keyspace_misses:5172

pubsub_channels:0

pubsub_patterns:0

latest_fork_usec:0

# Replication

role:master

connected_slaves:0

# CPU

used_cpu_sys:0.12

used_cpu_user:0.03

used_cpu_sys_childrens:0.00

used_cpu_user_childrens:0.00

# Commandstats

cmdstat_get:calls=1,usec=14,usec_per_call=14.00

cmdstat_lpush:calls=3,usec=31,usec_per_call=10.33

cmdstat_brpop:calls=3,usec=16,usec_per_call=5.33

cmdstat_expire:calls=3,usec=14,usec_per_call=4.67

# Cluster

cluster_enabled:0

# Keyspace

db0:keys=5,expires=5

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(_redisAssert+0x57) [0x8077647]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(unblockClientWaitingData+0x148) [0x80684e8]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(handleClientsWaitingListPush+0xc6) [0x8069f46]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(pushGenericCommand+0xde) [0x806a67e]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(lpushCommand+0x19) [0x806a7b9]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(call+0x35) [0x8057885]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(processCommand+0x2cc) [0x8057c5c]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(processInputBuffer+0x4f) [0x805f92f]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(readQueryFromClient+0x70) [0x805fa20]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(aeProcessEvents+0x132) [0x8052f62]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(aeMain+0x37) [0x8053177]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server(main+0x112) [0x8058dc2]

[13014] 29 Jun 08:23:26 # /lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xe6) [0xb7664bd6]

[13014] 29 Jun 08:23:26 # /opt/redis/bin/redis-server() [0x8052561]

Pieter Noordhuis

unread,

Jun 29, 2011, 4:33:37 AM6/29/11

to redi...@googlegroups.com

It is very likely that the failed assertion is just the result of
jemalloc being used instead of libc malloc. Chances are that the
address where incrRefCount SEGV'd for 2.2 is now valid and the
execution continues with the unblockClients call.

The weird thing here is that "server.bpop_blocked_clients" is only
touched in two places, one where it is incremented (upon blocking the
client) and where it is decremented (when a value is pushed to unblock
it, or the client releases its connection). The decrement is
apparently not done, which leads to believe the "flags" field on the
client struct is somehow changed to mark it as not blocking, which in
turn causes the blocking keys dictionary to not be cleaned up.

Matt: can you somehow trace what commands are executed here (since
there are only about 10), and where they originate? It is also helpful
to know if clients are randomly dropping connections or something
similar.

Thanks,
Pieter

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/OxOGT3EUuJ8J.

Pieter Noordhuis

unread,

Jun 29, 2011, 4:39:10 AM6/29/11

to redi...@googlegroups.com

Another request, just to rule things out: could you test if the crash
also happens using a 64-bit build of Redis?

Thanks,
Pieter

Salvatore Sanfilippo

unread,

Jun 29, 2011, 4:42:05 AM6/29/11

to redi...@googlegroups.com

On Wed, Jun 29, 2011 at 10:33 AM, Pieter Noordhuis
<pcnoo...@gmail.com> wrote:

> It is very likely that the failed assertion is just the result of
> jemalloc being used instead of libc malloc. Chances are that the
> address where incrRefCount SEGV'd for 2.2 is now valid and the
> execution continues with the unblockClients call.

Agreed... I tried with different scripts simulated a producer/consumer
with BRPOP, no way I can reproduce that, even when implementing random
connection dropping.

I think that at this point the simpler thing to do is, since Matt is
able to reproduce the bug in very little time, to ask him to compile
Redis with:

make noopt

and then run it over valgrind, and send us the valgrind trace that
will finally show where the issue happens.

Cheers,
Salvatore

--

Matt Billenstein

unread,

Jun 29, 2011, 4:57:16 AM6/29/11

to redi...@googlegroups.com

Hmm, I'm having trouble reproducing this at all now -- I'll see what I can come up with if I'm able to get it to break again...

m

Matt Billenstein

unread,

Jun 29, 2011, 4:58:04 AM6/29/11

to redi...@googlegroups.com

Our production setup uses 64bit instances and I haven't seen a single crash there although we've ran the tests there several times... But I can't always reproduce it on a 32bit instance either...

m

Matt Billenstein

unread,

Jun 29, 2011, 5:51:25 AM6/29/11

to redi...@googlegroups.com

Hmm, I was just reading the bit about delete on write here: http://redis.io/topics/expire

I was setting the expire on the list on every lpush which it seems is a no-no. Could this cause this sort of problem with having waiters trying to brpop from the list?

m

Pieter Noordhuis

unread,

Jun 29, 2011, 5:56:39 AM6/29/11

to redi...@googlegroups.com

The delete-on-write semantic is no longer in place since 2.2.
Expiration and writes can now be perfectly combined. However, mixing
it with the blocking pop might be a lead to what is causing this
stuff. Investigating some more :-)

Cheers,
Pieter

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/6G5HCDrNzycJ.

Hampus Wessman

unread,

Jun 29, 2011, 6:50:30 AM6/29/11

to redi...@googlegroups.com

On 2011-06-29 09:59, Salvatore Sanfilippo wrote:
> On Tue, Jun 28, 2011 at 11:43 PM, Matt Billenstein <mbil...@gmail.com> wrote:
>> connected_clients:2
>> connected_slaves:0
>> client_longest_output_list:0
>> client_biggest_input_buf:0
>> blocked_clients:5
> Pieter: note that...
>
> connected_clients:2, blocked_clients:5
>
> it is in the unregistering code I guess.
> And this is probably why the client has a target key, since it is
> simply a reused memory and that field is not null.
>
> Cheers,
> Salvatore

It appears like a client could potentially be "blocked" more than once
(without being unblocked in-between) in unstable, so that if you send
two blocking commands after each other (without waiting for a reply) the
above could happen (including all errors) because only the last one will
unblock cleanly later. This can happen because unstable keeps running
commands even when a client is blocked, due to this deleted line when
diskstore was removed:
https://github.com/antirez/redis/commit/c9d0c3623a7714bd41a35237f4ba927206a7adb6#L8L800.
I think that can be a problem. We probably still need "if (c->flags &
REDIS_BLOCKED) return;" there, unless I'm missing something.

This still doesn't explain the current issue for other versions and in
fact I doubt it is the cause of any of these problems (but it's still a
potential problem). It could be something similar, though. Perhaps there
are some other way that blockForKeys() can be called more than once
without unblockClientWaitingData() being called between... Either that
or blocked clients aren't cleaned up properly, as you said.

Matt, do you use pipelining in any way?

Regards,
Hampus

Salvatore Sanfilippo

unread,

Jun 29, 2011, 6:53:34 AM6/29/11

to redi...@googlegroups.com

Is it possible that our problem is related to the AOF being loaded and
processing the BRPOP commands creating "wrong" clients in waiting
state? I'll look at this possibility in some minute.

About your finding, I'll check that as well and fix if needed, thanks
for investigating.

Cheers,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.

> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--

Salvatore Sanfilippo

unread,

Jun 29, 2011, 7:13:10 AM6/29/11

to redi...@googlegroups.com

Yes I confirm that's the problem...

BRPOP and BLPOP are replicated without being translated into a
RPOP/LPOP, and there is no special check, so this clients gets queued
among the blocked clients, but it is actually always the same "fake"
client used to load the AOF. So at the first push, booooooommmmmmm.

Fixing it today and releasing a new Redis version.

Matt, Pieter, Hampus: thanks for your assistance.

Salvatore

Salvatore Sanfilippo

unread,

Jun 29, 2011, 7:18:41 AM6/29/11

to redi...@googlegroups.com

p.s. it is worth to note that the fix should also handle something
that will never happen in the future, that is, the loading of
BRPOP/BLPOP from the AOF... since the old broken instances generated
this AOFs that should be readable by new sane instances.

Salvatore

Hampus Wessman

unread,

Jun 29, 2011, 7:33:46 AM6/29/11

to redi...@googlegroups.com

Nice! That must be it. Tricky bug :)
Hampus

Salvatore Sanfilippo

unread,

Jun 29, 2011, 8:49:17 AM6/29/11

to redi...@googlegroups.com

Apparently it is more complex than that... it is true that a
BLPOP/RPOP in the AOF will lead to the bug, but only if the target
list is empty.
Now in theory this should never happen, so maybe this is where the
EXPIRE thing adds to the mix?

Investigating more to understand how a BRPOP can be written in the AOF
in a way that when processed the target list is empty.

Salvatore

On Wed, Jun 29, 2011 at 1:33 PM, Hampus Wessman

Salvatore Sanfilippo

unread,

Jun 29, 2011, 8:58:22 AM6/29/11

to redi...@googlegroups.com

On Wed, Jun 29, 2011 at 10:57 AM, Matt Billenstein <mbil...@gmail.com> wrote:
> Hmm, I'm having trouble reproducing this at all now -- I'll see what I can
> come up with if I'm able to get it to break again...

Hello Matt, if my findings are right the reason you can't reproduce
this again is since you deleted or rewrote your AOF file.
Apparently your testing Redis instance generated in some way that is
currently not clear an AOF file where a BRPOP was present without a
corresponding target list element. This corrupted the internal state
of every Redis instance you were trying to test.

Once the corrupted AOF was no longer loaded by the instances the bug
disappeared.

Now the new problem is to understand how an AOF with this
characteristic was generated.

Cheers,
Salvatore

Salvatore Sanfilippo

unread,

Jun 29, 2011, 10:15:02 AM6/29/11

to redi...@googlegroups.com

Hi all,

Update about the issue. It is still not clear how the corrupted AOF
was generated (I put an entry on my TODO list to perform further
tests), btw I just patched Redis 2.2, 2.4 and unstable in order to
report this problem while loading the AOF, instead of crashing at
random later at runtime.

Analyzing the offending AOF file may be of some help, however I think
Matt will hardly have it at hand since the AOF file was not suspected
at all during the analysis... but if the contrary is true I"ll be glad
to get a copy ;)

Cheers,
Salvatore

Matt Billenstein

unread,

Jun 29, 2011, 3:24:12 PM6/29/11

to redi...@googlegroups.com

Sounds like you're on the right track there -- if I cleared the aof between a crash and a restart it seemed like it wouldn't crash again on the subsequent test...

But later last night I wasn't able to repro the crash reliably anyway, so it may have just been coincidence.

m

Matt Billenstein

unread,

Jun 29, 2011, 3:52:28 PM6/29/11

to redi...@googlegroups.com

I left some conjured tests running last night doing lpush/expire and blpop in another couple python processes -- the server hadn't crashed by morning, but when I stopped and restarted it, it crashes now loading the aof:

[3391] 29 Jun 19:45:57 * Server started, Redis version 2.9.0

[3391] 29 Jun 19:45:57 # === ASSERTION FAILED ===

[3391] 29 Jun 19:45:57 # ==> aof.c:291 '(fakeClient->flags & REDIS_BLOCKED) == 0' is not true

[3391] 29 Jun 19:45:57 # (forcing SIGSEGV in order to print the stack trace)

[3391] 29 Jun 19:45:57 # ======= Ooops! Redis 2.9.0 got signal: -11- =======

[3391] 29 Jun 19:45:57 # # Server

redis_version:2.9.0

redis_git_sha1:ef67a2fc

redis_git_dirty:0

arch_bits:32

multiplexing_api:epoll

process_id:3391

tcp_port:6379

uptime_in_seconds:0

uptime_in_days:0

lru_clock:914251

# Clients

connected_clients:0

client_longest_output_list:0

client_biggest_input_buf:0

blocked_clients:1

# Memory

used_memory:556316

used_memory_human:543.28K

used_memory_rss:1466368

used_memory_peak:0

used_memory_peak_human:0B

mem_fragmentation_ratio:2.64

mem_allocator:jemalloc-2.2.1

# Allocstats

allocation_stats:4=6,6=1,8=7,9=15,10=8,11=18,12=10232,13=51,14=30,15=41,16=38,17=14,18=6,19=20,20=6,21=3,22=3,23=6,24=17,25=3,26=2,27=2,28=2,32=5,34=1,38=1,39=1,43=1,48=66,58=1,59=1,64=4,71=1,85=1,128=4,>=256=16

# Persistence

loading:1

aof_enabled:0

changes_since_last_save:10

bgsave_in_progress:0

last_save_time:1309376757

bgrewriteaof_in_progress:0

loading_start_time:1309376757

loading_total_bytes:1073845758

loading_loaded_bytes:0

loading_loaded_perc:0.00

loading_eta_seconds:1

# Stats

total_connections_received:0

total_commands_processed:0

expired_keys:0

evicted_keys:0

keyspace_hits:0

keyspace_misses:6

pubsub_channels:0

pubsub_patterns:0

latest_fork_usec:0

# Replication

role:master

connected_slaves:0

# CPU

used_cpu_sys:0.02

used_cpu_user:0.00

used_cpu_sys_childrens:0.00

used_cpu_user_childrens:0.00

# Commandstats

# Cluster

cluster_enabled:0

# Keyspace

[3391] 29 Jun 19:45:57 # /opt/redis/bin/redis-server(_redisAssert+0x57) [0x8077687]

[3391] 29 Jun 19:45:57 # /opt/redis/bin/redis-server(loadAppendOnlyFile+0x46d) [0x807617d]

[3391] 29 Jun 19:45:57 # /opt/redis/bin/redis-server(main+0x9a) [0x8058d4a]

[3391] 29 Jun 19:45:57 # /lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xe6) [0xb7603bd6]

[3391] 29 Jun 19:45:57 # /opt/redis/bin/redis-server() [0x8052561]

This is with the latest fix you made Salvatore -- the aof is here:

http://vazor.com/drop/appendonly.aof.gz

Maybe this is a slightly different issue, but maybe will help you understand the problem a bit better.

thx

m

Hampus Wessman

unread,

Jun 29, 2011, 4:50:40 PM6/29/11

to redi...@googlegroups.com

The problem here appears to be (as suspected) that the append only file
contains BRPOP commands that register the fake client as blocking when
loaded. Later when pushing to the list for the first time Redis tries to
unblock the now unallocated fake client and runs into problems.

I found out how these commands (probably) got there too. It is easy to
reproduce it all by running the following commands and then restarting
redis after the expiration has taken place. Make sure the expiration
happens after all the three commands are run and before Redis is started
again.

RPUSH test value
EXPIRE test 10
BLPOP test 0

Redis doesn't write BLPOP to the AOF if it blocks, but if it doesn't
block it is written to the append only file. This is not a problem if it
doesn't block when loaded either. With an expiration like above it will
block on load (because the key has expired and doesn't exist anymore),
but not when run the first time (it hasn't expired yet) and this
difference is what's triggering the bug. Transforming blocking pops into
non-blocking pops when writing to the AOF should solve it, as suggested
earlier.

Hampus

> --
> You received this message because you are subscribed to the Google
> Groups "Redis DB" group.

> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/k5Hi9df0RL4J.

Hampus Wessman

unread,

Jul 3, 2011, 6:48:34 AM7/3/11

to redi...@googlegroups.com

There are actually more related problems here that involves expiration
and AOF. For example, if you run the following:

RPUSH test foo
EXPIRE test 10
RPUSH test bar
SHUTDOWN

and then restart Redis after 'test' has expired, you will find that test
is not empty. While loading the append-only file 'test' is deleted when
the expire command is loaded and then the next command creates a new
list with one item in it. This is not what we would expect and is
different from what would happen if Redis was running when the
expiration happens. In the latter case the whole key would be deleted
after 10 seconds.

It turns out that solving this problem is rather easy and it
automatically solves the other problem too :) I created a branch with
the small fix (two lines of code changed) and a new test case. The fix
makes sure keys aren't expired while we are loading. They will be
expired after the load instead, just like if we hadn't restarted Redis.

The branch is on GitHub: https://github.com/hampus/redis/tree/fixaofexpire

Cheers,
Hampus

Salvatore Sanfilippo

unread,

Jul 3, 2011, 8:35:03 AM7/3/11

to redi...@googlegroups.com

Great stuff, this is what I found in the Redis unstable TODO list btw!

* What happens in the following scenario:
1) We are reading an AOF file.
2) SETEX FOO 5 BAR
3) APPEND FOO ZAP
What happens if between 1 and 2 for some reason (system under huge load
or alike) too many time passes? We should prevent expires while the
AOF is loading.

I put this not inside and forgot afterward ...

Will check all this email and the fix monday. Thank you!

Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.

> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--

Salvatore Sanfilippo

unread,

Jul 4, 2011, 1:45:01 PM7/4/11

to redi...@googlegroups.com

Hello Hampus,

I studied the issue a bit, your patch is in the right direction but I
think it is not enough for the following reason:

EXPIRE key -1 (and in general any value < 0) has the effect of deleting the key.
The problem is, we replicate it as EXPIREAT, while we should
explicitly replicate it untouched as EXPIRE key -1.

Otherwise with your fix that prevents expire with negative value for
getting executed while loading the AOF, and we get an inconsistency.

So the right fix should be to replicate EXPIRE untouched for negative
TTLs (since it's a DEL), and also execute this calls when loading the
AOF. There is to think also about EXPIREAT in this context, and in the
context of replication, maybe we have some race there as well... I'll
write a better email about all this issue tomorrow, this one is just
to check with you if you also think there is an issue here.

Cheers,
Salvatore

On Sun, Jul 3, 2011 at 12:48 PM, Hampus Wessman
<hampus....@gmail.com> wrote:

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.

> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--

Salvatore Sanfilippo

unread,

Jul 5, 2011, 5:54:38 AM7/5/11

to redi...@googlegroups.com

Hello again :)

Here I'm trying to formalize all the AOF/EXPIRE/REPLICATION issues we
have, in order to propose a full fix.
Hampus fix works for most cases, but we have a few more... so here is
the full list of issues.

Any feedback from Pieter and Hampus will be appreciated as this stuff
are pretty subtle, and three are better than one for sure.

1) currently we are processing expires while loading the AOF.

Quoting Hampus:

> RPUSH test foo
> EXPIRE test 10
> RPUSH test bar
> SHUTDOWN
>
> and then restart Redis after 'test' has expired, you will find that test
> is not empty. While loading the append-only file 'test' is deleted when
> the expire command is loaded and then the next command creates a new

> list with one item in it. This is not what we would expect [snip].

This issue can be easily fixed not processing expires while the DB is
loading, since IF the key really expired, we'll find a DEL that was
synthesized by Redis and sent to the AOF and Slave.

But this brings us to the bug #2.

2) Expires are put into the AOF and replicated as EXPIREAT that is a
time dependent operation.

The problem with that is that EXPIREAT with an unix time in the past
has the semantics of acting as a DEL!
(The same happens for EXPIRE with a negative TTL).

So what happens is that if you have:

> RPUSH test foo
> EXPIRE test 10
> RPUSH test bar

this will actually get into AOF and replicated as:

> RPUSH test foo
> EXPIREAT test <some unix time>
> RPUSH test bar

Here we have two cases, either there is an explicit DEL before the
second RPUSH since more than 10 seconds elapsed between the first and
the second RPUSH. The other case is that no DEL is synthesized since
less than 10 seconds elapsed.

With the current patch what happens is that EXPIREAT is completely not
processed:

- if (seconds <= 0) {
519
+ if (seconds <= 0 && !server.loading) {

So the "test" key remains without an expire set, as a persistent key,
that is clearly an issue.

The fix about that is that while server.loading is true we process an
EXPIREAT in the past as an EXPIRE with the current time.
Something like:

if (seconds <= 0) {
if (server.loading) {
setExpire(db,key,time(NULL));
} else {
.... delete it
}
}

From the point of view of replication this is not a problem as we
replicate EXPIRE as EXPIRE, not translating it as EXPIREAT.

3) EXPIRE with -1 and EXPIREAT in the past.

EXPIRE with < 0 TTL has the effect of deleting the key, but it gets
replicate as:

set
$1
x
$4
ciao
*3
$8
EXPIREAT
$1
x
$10
1309859420

So it is broken every time the time does not happen to be in the past
(clock issues in the server).

Similarly EXPIREAT with a time that is in the past will also delete
the key. Same issue here about the AOF. But there is also an issue
with the replication here, as the slaves may have the clock in desync.

To fix both this issues an EXPIRE with -1 and an EXPIREAT in the past
should be both replicated either as DEL or as EXPIRE -1.

Ok I think this is the full lists of issues and fixes we need.

Comments welcomed.

Cheers,
Salvatore

On Sun, Jul 3, 2011 at 12:48 PM, Hampus Wessman
<hampus....@gmail.com> wrote:

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.

> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--

Hampus Wessman

unread,

Jul 6, 2011, 5:38:14 AM7/6/11

to redi...@googlegroups.com

Hi Salvatore!

On 2011-07-05 11:54, Salvatore Sanfilippo wrote:
> Hello again :)
>
> Here I'm trying to formalize all the AOF/EXPIRE/REPLICATION issues we
> have, in order to propose a full fix.
> Hampus fix works for most cases, but we have a few more... so here is
> the full list of issues.

Yes, indeed. Lets try to cover them all.

Yes and the second case is the interesting one :)

> With the current patch what happens is that EXPIREAT is completely not
> processed:
>
> - if (seconds <= 0) {
> 519
> + if (seconds <= 0 && !server.loading) {
>
> So the "test" key remains without an expire set, as a persistent key,
> that is clearly an issue.

Actually, the else-block will set an expiration on the key with a time
in the past so the key should be deleted later on in serverCron() or
expireIfNeeded(), whichever gets there first. The latter should make
sure that these expired keys are never returned to a client. I think
these functions all work, even in possibly quite extreme cases (the
expiration time could be several days in the past here!).

> The fix about that is that while server.loading is true we process an
> EXPIREAT in the past as an EXPIRE with the current time.
> Something like:
>
> if (seconds <= 0) {
> if (server.loading) {
> setExpire(db,key,time(NULL));
> } else {
> .... delete it
> }
> }

Have a look at the else-clause and see if it does what we need.
Hopefully it's enough already.

> From the point of view of replication this is not a problem as we
> replicate EXPIRE as EXPIRE, not translating it as EXPIREAT.
>
> 3) EXPIRE with -1 and EXPIREAT in the past.
>
> EXPIRE with < 0 TTL has the effect of deleting the key, but it gets
> replicate as:
>
> set
> $1
> x
> $4
> ciao
> *3
> $8
> EXPIREAT
> $1
> x
> $10
> 1309859420
>
> So it is broken every time the time does not happen to be in the past
> (clock issues in the server).
>
> Similarly EXPIREAT with a time that is in the past will also delete
> the key. Same issue here about the AOF. But there is also an issue
> with the replication here, as the slaves may have the clock in desync.
>
> To fix both this issues an EXPIRE with -1 and an EXPIREAT in the past
> should be both replicated either as DEL or as EXPIRE -1.

This is definitely a problem, both for the AOF and replication.
Expirations are already replicated and logged as DELs in other cases, so
the best thing would probably be to do so here too.

I updated my branch with a small fix. It calls propagateExpire() instead
of replicating the expiration itself. That should send a DEL to both the
AOF and slaves (the commit msg only mentions replication, but I've
already pushed it...). I also added one more check so that slaves never
delete a key when they see an EXPIRE/EXPIREAT command with an expiration
time in the past, as that could still easily happen when doing EXPIREAT
one second into the future or similar. The slave will set an expiration
instead and wait for a DEL from the master as usual.

A problem with this last fix is that it won't be completely compatible
with earlier versions. If the master runs an earlier version it will
replicate 'EXPIRE foo -1' as is and a slave with this new version will
not delete the key when it sees that. On the other hand, similar things
can already happen due to the bug we want to fix... We could create a
workaround for this, but I'm not sure it's worth it. What do you think?

> Ok I think this is the full lists of issues and fixes we need.
>
> Comments welcomed.
>

I think that's everything too. Good that you spotted problem #3 above.
Nice summary of it all too. Please check my update when you have time. I
may still have missed something, but otherwise this could be it.

My updated branch: https://github.com/hampus/redis/commits/fixaofexpire

Regards,
Hampus

Hampus Wessman

unread,

Jul 6, 2011, 7:04:44 AM7/6/11

to redi...@googlegroups.com

On 2011-07-06 11:38, Hampus Wessman wrote:
> My updated branch: https://github.com/hampus/redis/commits/fixaofexpire

Here's a new branch with a better commit message and a rebase:
https://github.com/hampus/redis/commits/fixexpire

Feel free to solve it differently, of course. This is just a suggestion.

Cheers,
Hampus

Salvatore Sanfilippo

unread,

Jul 7, 2011, 6:22:25 AM7/7/11

to redi...@googlegroups.com

On Wed, Jul 6, 2011 at 11:38 AM, Hampus Wessman
<hampus....@gmail.com> wrote:
> Hi Salvatore!

Hey Hampus!

> Actually, the else-block will set an expiration on the key with a time
> in the past so the key should be deleted later on in serverCron() or
> expireIfNeeded(), whichever gets there first. The latter should make
> sure that these expired keys are never returned to a client. I think
> these functions all work, even in possibly quite extreme cases (the
> expiration time could be several days in the past here!).

You are right, thanks I overlooked the "fall through" :)

>> To fix both this issues an EXPIRE with -1 and an EXPIREAT in the past
>> should be both replicated either as DEL or as EXPIRE -1.
>
> This is definitely a problem, both for the AOF and replication.
> Expirations are already replicated and logged as DELs in other cases, so
> the best thing would probably be to do so here too.
>
> I updated my branch with a small fix. It calls propagateExpire() instead
> of replicating the expiration itself. That should send a DEL to both the
> AOF and slaves (the commit msg only mentions replication, but I've
> already pushed it...). I also added one more check so that slaves never
> delete a key when they see an EXPIRE/EXPIREAT command with an expiration
> time in the past, as that could still easily happen when doing EXPIREAT
> one second into the future or similar. The slave will set an expiration
> instead and wait for a DEL from the master as usual.

Great! Sounds like a sensible fix, possibly we can refactor this later
if needed but for now sounds like a good fix.

> A problem with this last fix is that it won't be completely compatible
> with earlier versions. If the master runs an earlier version it will
> replicate 'EXPIRE foo -1' as is and a slave with this new version will
> not delete the key when it sees that. On the other hand, similar things
> can already happen due to the bug we want to fix... We could create a
> workaround for this, but I'm not sure it's worth it. What do you think?

I think it's fine... this is a really subtle incompatibility and the
old behavior was bug-driven, I don't think there is a single user
relaying on such a "feature" :)

> I think that's everything too. Good that you spotted problem #3 above.
> Nice summary of it all too. Please check my update when you have time. I
> may still have missed something, but otherwise this could be it.
>
> My updated branch: https://github.com/hampus/redis/commits/fixaofexpire

Awesome, I'm going to check the code and very probably I'll merge that.
Will reply with a follow up in a few hours at max.

Thank you,
Salvatore

> Regards,

Salvatore Sanfilippo

unread,

Jul 7, 2011, 6:29:25 AM7/7/11

to redi...@googlegroups.com

Great work, I see only a small problem:

- if (seconds <= 0 && !server.loading) {
520
- if (dbDelete(c->db,key)) server.dirty++;
519
+ /* Never delete key immediately when loading AOF or if this is a slave */
520
+ if (seconds <= 0 && !server.loading && !server.masterhost) {
521
+ if (dbDelete(c->db,key)) propagateExpire(c->db,key);

Here we are no longer incrementing the dirty counter so actually the
"save" will not trigger assuming you have only changes of this
(strange) kind. Not a big problem... but in general this shows us how
we are no longer ok with the current semantics in the internals.

We should have two separated counters, one is dirty, and one is
server.propagate (or alike).
This way we can have a simple function that we call when the intention
is to propagate the command to AOF/Replication link like:

propagateCommand(PROPAGATE);

while dirty stuff is incremented only for the sake of triggering the
save operation.

We can do that into unstable the first time there is a chance for a
refactoring (possibly after Redis 3.0 we should stop and just do
that).

Otherwise the patch is fine, but in order to fix this small problem
maybe it's worth calling the rewriteClientCommandVector() function
instead?

Cheers,
Salvatore

> Cheers,

Hampus Wessman

unread,

Jul 7, 2011, 6:58:21 AM7/7/11

to redi...@googlegroups.com

On 2011-07-07 12:29, Salvatore Sanfilippo wrote:
> On Wed, Jul 6, 2011 at 1:04 PM, Hampus Wessman <hampus....@gmail.com> wrote:
>> On 2011-07-06 11:38, Hampus Wessman wrote:
>>> My updated branch: https://github.com/hampus/redis/commits/fixaofexpire
>> Here's a new branch with a better commit message and a rebase:
>> https://github.com/hampus/redis/commits/fixexpire
>>
>> Feel free to solve it differently, of course. This is just a suggestion.
> Great work, I see only a small problem:
>
> - if (seconds <= 0 && !server.loading) {
> 520
> - if (dbDelete(c->db,key)) server.dirty++;
> 519
> + /* Never delete key immediately when loading AOF or if this is a slave */
> 520
> + if (seconds <= 0 && !server.loading && !server.masterhost) {
> 521
> + if (dbDelete(c->db,key)) propagateExpire(c->db,key);
>
> Here we are no longer incrementing the dirty counter so actually the
> "save" will not trigger assuming you have only changes of this
> (strange) kind. Not a big problem... but in general this shows us how
> we are no longer ok with the current semantics in the internals.

True. That was something I hadn't thought about at all ;) Not a big
problem, but still unlogical if expire commands don't count here!

> We should have two separated counters, one is dirty, and one is
> server.propagate (or alike).
> This way we can have a simple function that we call when the intention
> is to propagate the command to AOF/Replication link like:
>
> propagateCommand(PROPAGATE);
>
> while dirty stuff is incremented only for the sake of triggering the
> save operation.
>
> We can do that into unstable the first time there is a chance for a
> refactoring (possibly after Redis 3.0 we should stop and just do
> that).

I agree. Something like that would make it clearer.

> Otherwise the patch is fine, but in order to fix this small problem
> maybe it's worth calling the rewriteClientCommandVector() function
> instead?

Or we simply treat expires the same regardless of whether they expire
instantly or not, i.e. we remove the if-statement and always set an
expiration instead (the else-statement, right now). I don't think it is
that common to issue expire/expireat with a negative TTL and in that
case we shouldn't lose much by doing it this way, as far as I can tell.
The expired keys will be around in memory for a while, but it shouldn't
have any other effects.

What do you think? That's probably the easiest way. Otherwise
rewriteClientCommandVector() looks like the way to go.

Hampus

Salvatore Sanfilippo

unread,

Jul 7, 2011, 7:10:04 AM7/7/11

to redi...@googlegroups.com

On Thu, Jul 7, 2011 at 12:58 PM, Hampus Wessman
<hampus....@gmail.com> wrote:
> Or we simply treat expires the same regardless of whether they expire
> instantly or not, i.e. we remove the if-statement and always set an
> expiration instead (the else-statement, right now). I don't think it is
> that common to issue expire/expireat with a negative TTL and in that
> case we shouldn't lose much by doing it this way, as far as I can tell.

As I said yesterday to Pieter during an IRC chat that would be the way
to go, it was a design error to allow negative expires to behave as
deletes, but this is a too big semantical change at this point I
think... so in theory I like the idea but I think it's better to take
the semantics unchanged for now. Maybe we can break this later with
the 3.0 release, but in such a case I would emit an error for EXPIRE
with a TTL < 0 too... and just use the current time if a timestamp is
in the past in an EXPIREAT call.

You can either update the branch or I can merge and add a commit to
change this, as you wish.

Thanks!
Salvatore

Hampus Wessman

unread,

Jul 7, 2011, 7:18:21 AM7/7/11

to redi...@googlegroups.com

On 2011-07-07 13:10, Salvatore Sanfilippo wrote:
> On Thu, Jul 7, 2011 at 12:58 PM, Hampus Wessman
> <hampus....@gmail.com> wrote:
>> Or we simply treat expires the same regardless of whether they expire
>> instantly or not, i.e. we remove the if-statement and always set an
>> expiration instead (the else-statement, right now). I don't think it is
>> that common to issue expire/expireat with a negative TTL and in that
>> case we shouldn't lose much by doing it this way, as far as I can tell.
> As I said yesterday to Pieter during an IRC chat that would be the way
> to go, it was a design error to allow negative expires to behave as
> deletes, but this is a too big semantical change at this point I
> think... so in theory I like the idea but I think it's better to take
> the semantics unchanged for now. Maybe we can break this later with
> the 3.0 release, but in such a case I would emit an error for EXPIRE
> with a TTL < 0 too... and just use the current time if a timestamp is
> in the past in an EXPIREAT call.

Yes, but I would expect expireIfNeeded() to delete the key later anyway,
before the client ever notices the difference (on the first access, as
its expiration time is already passed). Possibly it isn't guaranteed to
work like that, but I think it will. Or?

> You can either update the branch or I can merge and add a commit to
> change this, as you wish.

Probably easier that you add a commit to fix this last thing. Thanks for
merging!

Cheers,
Hampus

Salvatore Sanfilippo

unread,

Jul 7, 2011, 7:22:41 AM7/7/11

to redi...@googlegroups.com

On Thu, Jul 7, 2011 at 1:18 PM, Hampus Wessman <hampus....@gmail.com> wrote:
> Yes, but I would expect expireIfNeeded() to delete the key later anyway,
> before the client ever notices the difference (on the first access, as
> its expiration time is already passed). Possibly it isn't guaranteed to
> work like that, but I think it will. Or?

Since this is obtained with lazy expiring the main difference would be that:

1) DBSIZE will return the key in the count. Not a big issue.
2) If in the master you dont' touch the key nor it gets sampled soon
enough in the slave you can see the key as existing. This already
happens with EXPIRE but is something we probably have to live with,
but if this happens in a command that is semantically a delete is a
bigger issue.

>> You can either update the branch or I can merge and add a commit to
>> change this, as you wish.
>
> Probably easier that you add a commit to fix this last thing. Thanks for
> merging!

Perfect, thanks!

Hampus Wessman

unread,

Jul 7, 2011, 7:31:20 AM7/7/11

to redi...@googlegroups.com

On 2011-07-07 13:22, Salvatore Sanfilippo wrote:
> On Thu, Jul 7, 2011 at 1:18 PM, Hampus Wessman <hampus....@gmail.com> wrote:
>> Yes, but I would expect expireIfNeeded() to delete the key later anyway,
>> before the client ever notices the difference (on the first access, as
>> its expiration time is already passed). Possibly it isn't guaranteed to
>> work like that, but I think it will. Or?
> Since this is obtained with lazy expiring the main difference would be that:
>
> 1) DBSIZE will return the key in the count. Not a big issue.
> 2) If in the master you dont' touch the key nor it gets sampled soon
> enough in the slave you can see the key as existing. This already
> happens with EXPIRE but is something we probably have to live with,
> but if this happens in a command that is semantically a delete is a
> bigger issue.

Yeah, that's true. Good point. #2 could happen after loading the AOF
too. Probably less problematic there, but we could go through all
expired keys after we finish loading and delete them. Would be a nice
improvement, but requires some more code (I think) so probably better to
wait with that.

Salvatore Sanfilippo

unread,

Jul 7, 2011, 10:32:46 AM7/7/11

to redi...@googlegroups.com

On Thu, Jul 7, 2011 at 1:31 PM, Hampus Wessman <hampus....@gmail.com> wrote:
> Yeah, that's true. Good point. #2 could happen after loading the AOF
> too. Probably less problematic there, but we could go through all
> expired keys after we finish loading and delete them. Would be a nice
> improvement, but requires some more code (I think) so probably better to
> wait with that.

Great. I merged the first two commits in your branch and added the
third commit with the change we talked about plus another one.

Pieter, Hampus, all the changes look ok for you as well? If so I'm
going to backport into 2.2 / 2.4 everything.

Pieter Noordhuis

unread,

Jul 7, 2011, 1:00:13 PM7/7/11

to redi...@googlegroups.com

I'm +1 on everything. I tried to come up with scenario's where this
would still fail, but can't think of any. This is a very hard problem,
so chances are we're still overlooking something but this seems to be
the right direction.

The problem of the slave not being able to expire things for reasons
of consistency (basically that it is required to wait for the DEL from
the master) is a real one and needs to be addressed at some point in
the future but will be pretty invasive.

In the review I came across a minor thing that doesn't impact behavior
but was suboptimal, patched here:
https://github.com/pietern/redis/tree/unstable-expire-loading

Cheers,
Pieter

Hampus Wessman

unread,

Jul 7, 2011, 2:04:27 PM7/7/11

to redi...@googlegroups.com

I agree with Pieter. Good patch.

I did find one problem, but apart from that I can't either think of any
way this wouldn't work now.

The problem I found is that call() in redis.c keeps a reference to the
original command and passes it on to feedAppendOnlyFile(). When we
replace an EXPIRE with a DEL, feedAppendOnlyFile() still believes it is
an EXPIRE and tries to convert it into an EXPIREAT. This doesn't work as
the arguments have been changed (it crashes).

Not sure how to best fix that. We should probably check argv[0] in
feedAppendOnlyFile() instead of cmd. Slightly slower, but should work.

I think this one deserves a test case to. It happens if you run SET foo
bar, EXPIRE foo -1 with AOF enabled.

Getting late here, so I won't write a patch now. Better that you have a
look at it tomorrow, Salvatore. I can write a fix and/or test case
tomorrow if you want. Only if it helps.

BR,
Hampus

Matt Billenstein

unread,

Jul 8, 2011, 3:05:37 AM7/8/11

to redi...@googlegroups.com

Thanks guys -- <3 open source.

m

Salvatore Sanfilippo

unread,

Jul 8, 2011, 5:51:31 AM7/8/11

to redi...@googlegroups.com

Thanks Pieter, Hampus!

Hampus: yhanks for spotting this! I'm adding a test, trying to fix it
the proper way.

I'll take all you posted guys.

Salvatore

Salvatore Sanfilippo

unread,

Jul 8, 2011, 7:04:45 AM7/8/11

to redi...@googlegroups.com

Hey!

This is what I think should be the direction for unstable/2.4,
probably its' not a good idea for 2.2 where we need a less "involved"
patch:

https://github.com/antirez/redis/commit/09e2d9eeba3ff65fd60f905a5bcb0f684f7a883e

Cheers,
Salvatore

Reply all

Reply to author

Forward