The stack traces are very helpful, thank you. I'm investigating the issue.
Cheers,
Pieter
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/Ei3MZy24CaMJ.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
>
> The stack traces are very helpful, thank you. I'm investigating the issue.
Hello Pieter, I was starting to take a look as well, this is probably
related to my fix for the BRPOPLPUSH replication bug but I'm not 100%
sure.
From the trace it looks like it is very probably the same bug btw,
with incrRefCount called against some already freed or otherwise
invalid object.
Cheers,
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
Thanks,
Salvatore
On Wed, Jun 29, 2011 at 9:35 AM, Matt Billenstein <mbil...@gmail.com> wrote:
> Thanks much guys.
> m
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/rhExKjKe6kwJ.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
>
--
Thank you, are you using a timeout of 0 (forever) or there is an
actual timeout? Thanks and sorry for all the questions but this helps!
Salvatore
Pieter: note that...
connected_clients:2, blocked_clients:5
it is in the unregistering code I guess.
And this is probably why the client has a target key, since it is
simply a reused memory and that field is not null.
Cheers,
Very interesting, so it is not "new iterator" related, and still it
survived to the refactoring of this part of the code.
Since you are so gentle, please could you give a try to latest unstable branch?
Thanks,
Salvatore
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/87Xh_04qgi8J.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
>
--
The weird thing here is that "server.bpop_blocked_clients" is only
touched in two places, one where it is incremented (upon blocking the
client) and where it is decremented (when a value is pushed to unblock
it, or the client releases its connection). The decrement is
apparently not done, which leads to believe the "flags" field on the
client struct is somehow changed to mark it as not blocking, which in
turn causes the blocking keys dictionary to not be cleaned up.
Matt: can you somehow trace what commands are executed here (since
there are only about 10), and where they originate? It is also helpful
to know if clients are randomly dropping connections or something
similar.
Thanks,
Pieter
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/OxOGT3EUuJ8J.
Thanks,
Pieter
Agreed... I tried with different scripts simulated a producer/consumer
with BRPOP, no way I can reproduce that, even when implementing random
connection dropping.
I think that at this point the simpler thing to do is, since Matt is
able to reproduce the bug in very little time, to ask him to compile
Redis with:
make noopt
and then run it over valgrind, and send us the valgrind trace that
will finally show where the issue happens.
Cheers,
Salvatore
--
Cheers,
Pieter
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/6G5HCDrNzycJ.
It appears like a client could potentially be "blocked" more than once
(without being unblocked in-between) in unstable, so that if you send
two blocking commands after each other (without waiting for a reply) the
above could happen (including all errors) because only the last one will
unblock cleanly later. This can happen because unstable keeps running
commands even when a client is blocked, due to this deleted line when
diskstore was removed:
https://github.com/antirez/redis/commit/c9d0c3623a7714bd41a35237f4ba927206a7adb6#L8L800.
I think that can be a problem. We probably still need "if (c->flags &
REDIS_BLOCKED) return;" there, unless I'm missing something.
This still doesn't explain the current issue for other versions and in
fact I doubt it is the cause of any of these problems (but it's still a
potential problem). It could be something similar, though. Perhaps there
are some other way that blockForKeys() can be called more than once
without unblockClientWaitingData() being called between... Either that
or blocked clients aren't cleaned up properly, as you said.
Matt, do you use pipelining in any way?
Regards,
Hampus
About your finding, I'll check that as well and fix if needed, thanks
for investigating.
Cheers,
Salvatore
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
--
BRPOP and BLPOP are replicated without being translated into a
RPOP/LPOP, and there is no special check, so this clients gets queued
among the blocked clients, but it is actually always the same "fake"
client used to load the AOF. So at the first push, booooooommmmmmm.
Fixing it today and releasing a new Redis version.
Matt, Pieter, Hampus: thanks for your assistance.
Salvatore
Salvatore
Investigating more to understand how a BRPOP can be written in the AOF
in a way that when processed the target list is empty.
Salvatore
On Wed, Jun 29, 2011 at 1:33 PM, Hampus Wessman
Hello Matt, if my findings are right the reason you can't reproduce
this again is since you deleted or rewrote your AOF file.
Apparently your testing Redis instance generated in some way that is
currently not clear an AOF file where a BRPOP was present without a
corresponding target list element. This corrupted the internal state
of every Redis instance you were trying to test.
Once the corrupted AOF was no longer loaded by the instances the bug
disappeared.
Now the new problem is to understand how an AOF with this
characteristic was generated.
Cheers,
Salvatore
Update about the issue. It is still not clear how the corrupted AOF
was generated (I put an entry on my TODO list to perform further
tests), btw I just patched Redis 2.2, 2.4 and unstable in order to
report this problem while loading the AOF, instead of crashing at
random later at runtime.
Analyzing the offending AOF file may be of some help, however I think
Matt will hardly have it at hand since the AOF file was not suspected
at all during the analysis... but if the contrary is true I"ll be glad
to get a copy ;)
Cheers,
Salvatore
I found out how these commands (probably) got there too. It is easy to
reproduce it all by running the following commands and then restarting
redis after the expiration has taken place. Make sure the expiration
happens after all the three commands are run and before Redis is started
again.
RPUSH test value
EXPIRE test 10
BLPOP test 0
Redis doesn't write BLPOP to the AOF if it blocks, but if it doesn't
block it is written to the append only file. This is not a problem if it
doesn't block when loaded either. With an expiration like above it will
block on load (because the key has expired and doesn't exist anymore),
but not when run the first time (it hasn't expired yet) and this
difference is what's triggering the bug. Transforming blocking pops into
non-blocking pops when writing to the AOF should solve it, as suggested
earlier.
Hampus
> --
> You received this message because you are subscribed to the Google
> Groups "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/k5Hi9df0RL4J.
RPUSH test foo
EXPIRE test 10
RPUSH test bar
SHUTDOWN
and then restart Redis after 'test' has expired, you will find that test
is not empty. While loading the append-only file 'test' is deleted when
the expire command is loaded and then the next command creates a new
list with one item in it. This is not what we would expect and is
different from what would happen if Redis was running when the
expiration happens. In the latter case the whole key would be deleted
after 10 seconds.
It turns out that solving this problem is rather easy and it
automatically solves the other problem too :) I created a branch with
the small fix (two lines of code changed) and a new test case. The fix
makes sure keys aren't expired while we are loading. They will be
expired after the load instead, just like if we hadn't restarted Redis.
The branch is on GitHub: https://github.com/hampus/redis/tree/fixaofexpire
Cheers,
Hampus
* What happens in the following scenario:
1) We are reading an AOF file.
2) SETEX FOO 5 BAR
3) APPEND FOO ZAP
What happens if between 1 and 2 for some reason (system under huge load
or alike) too many time passes? We should prevent expires while the
AOF is loading.
I put this not inside and forgot afterward ...
Will check all this email and the fix monday. Thank you!
Salvatore
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
--
I studied the issue a bit, your patch is in the right direction but I
think it is not enough for the following reason:
EXPIRE key -1 (and in general any value < 0) has the effect of deleting the key.
The problem is, we replicate it as EXPIREAT, while we should
explicitly replicate it untouched as EXPIRE key -1.
Otherwise with your fix that prevents expire with negative value for
getting executed while loading the AOF, and we get an inconsistency.
So the right fix should be to replicate EXPIRE untouched for negative
TTLs (since it's a DEL), and also execute this calls when loading the
AOF. There is to think also about EXPIREAT in this context, and in the
context of replication, maybe we have some race there as well... I'll
write a better email about all this issue tomorrow, this one is just
to check with you if you also think there is an issue here.
Cheers,
Salvatore
On Sun, Jul 3, 2011 at 12:48 PM, Hampus Wessman
<hampus....@gmail.com> wrote:
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
--
Here I'm trying to formalize all the AOF/EXPIRE/REPLICATION issues we
have, in order to propose a full fix.
Hampus fix works for most cases, but we have a few more... so here is
the full list of issues.
Any feedback from Pieter and Hampus will be appreciated as this stuff
are pretty subtle, and three are better than one for sure.
1) currently we are processing expires while loading the AOF.
Quoting Hampus:
> RPUSH test foo
> EXPIRE test 10
> RPUSH test bar
> SHUTDOWN
>
> and then restart Redis after 'test' has expired, you will find that test
> is not empty. While loading the append-only file 'test' is deleted when
> the expire command is loaded and then the next command creates a new
> list with one item in it. This is not what we would expect [snip].
This issue can be easily fixed not processing expires while the DB is
loading, since IF the key really expired, we'll find a DEL that was
synthesized by Redis and sent to the AOF and Slave.
But this brings us to the bug #2.
2) Expires are put into the AOF and replicated as EXPIREAT that is a
time dependent operation.
The problem with that is that EXPIREAT with an unix time in the past
has the semantics of acting as a DEL!
(The same happens for EXPIRE with a negative TTL).
So what happens is that if you have:
> RPUSH test foo
> EXPIRE test 10
> RPUSH test bar
this will actually get into AOF and replicated as:
> RPUSH test foo
> EXPIREAT test <some unix time>
> RPUSH test bar
Here we have two cases, either there is an explicit DEL before the
second RPUSH since more than 10 seconds elapsed between the first and
the second RPUSH. The other case is that no DEL is synthesized since
less than 10 seconds elapsed.
With the current patch what happens is that EXPIREAT is completely not
processed:
- if (seconds <= 0) {
519
+ if (seconds <= 0 && !server.loading) {
So the "test" key remains without an expire set, as a persistent key,
that is clearly an issue.
The fix about that is that while server.loading is true we process an
EXPIREAT in the past as an EXPIRE with the current time.
Something like:
if (seconds <= 0) {
if (server.loading) {
setExpire(db,key,time(NULL));
} else {
.... delete it
}
}
From the point of view of replication this is not a problem as we
replicate EXPIRE as EXPIRE, not translating it as EXPIREAT.
3) EXPIRE with -1 and EXPIREAT in the past.
EXPIRE with < 0 TTL has the effect of deleting the key, but it gets
replicate as:
set
$1
x
$4
ciao
*3
$8
EXPIREAT
$1
x
$10
1309859420
So it is broken every time the time does not happen to be in the past
(clock issues in the server).
Similarly EXPIREAT with a time that is in the past will also delete
the key. Same issue here about the AOF. But there is also an issue
with the replication here, as the slaves may have the clock in desync.
To fix both this issues an EXPIRE with -1 and an EXPIREAT in the past
should be both replicated either as DEL or as EXPIRE -1.
Ok I think this is the full lists of issues and fixes we need.
Comments welcomed.
Cheers,
Salvatore
On Sun, Jul 3, 2011 at 12:48 PM, Hampus Wessman
<hampus....@gmail.com> wrote:
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
--
On 2011-07-05 11:54, Salvatore Sanfilippo wrote:
> Hello again :)
>
> Here I'm trying to formalize all the AOF/EXPIRE/REPLICATION issues we
> have, in order to propose a full fix.
> Hampus fix works for most cases, but we have a few more... so here is
> the full list of issues.
Yes, indeed. Lets try to cover them all.
Yes and the second case is the interesting one :)
> With the current patch what happens is that EXPIREAT is completely not
> processed:
>
> - if (seconds <= 0) {
> 519
> + if (seconds <= 0 && !server.loading) {
>
> So the "test" key remains without an expire set, as a persistent key,
> that is clearly an issue.
Actually, the else-block will set an expiration on the key with a time
in the past so the key should be deleted later on in serverCron() or
expireIfNeeded(), whichever gets there first. The latter should make
sure that these expired keys are never returned to a client. I think
these functions all work, even in possibly quite extreme cases (the
expiration time could be several days in the past here!).
> The fix about that is that while server.loading is true we process an
> EXPIREAT in the past as an EXPIRE with the current time.
> Something like:
>
> if (seconds <= 0) {
> if (server.loading) {
> setExpire(db,key,time(NULL));
> } else {
> .... delete it
> }
> }
Have a look at the else-clause and see if it does what we need.
Hopefully it's enough already.
> From the point of view of replication this is not a problem as we
> replicate EXPIRE as EXPIRE, not translating it as EXPIREAT.
>
> 3) EXPIRE with -1 and EXPIREAT in the past.
>
> EXPIRE with < 0 TTL has the effect of deleting the key, but it gets
> replicate as:
>
> set
> $1
> x
> $4
> ciao
> *3
> $8
> EXPIREAT
> $1
> x
> $10
> 1309859420
>
> So it is broken every time the time does not happen to be in the past
> (clock issues in the server).
>
> Similarly EXPIREAT with a time that is in the past will also delete
> the key. Same issue here about the AOF. But there is also an issue
> with the replication here, as the slaves may have the clock in desync.
>
> To fix both this issues an EXPIRE with -1 and an EXPIREAT in the past
> should be both replicated either as DEL or as EXPIRE -1.
This is definitely a problem, both for the AOF and replication.
Expirations are already replicated and logged as DELs in other cases, so
the best thing would probably be to do so here too.
I updated my branch with a small fix. It calls propagateExpire() instead
of replicating the expiration itself. That should send a DEL to both the
AOF and slaves (the commit msg only mentions replication, but I've
already pushed it...). I also added one more check so that slaves never
delete a key when they see an EXPIRE/EXPIREAT command with an expiration
time in the past, as that could still easily happen when doing EXPIREAT
one second into the future or similar. The slave will set an expiration
instead and wait for a DEL from the master as usual.
A problem with this last fix is that it won't be completely compatible
with earlier versions. If the master runs an earlier version it will
replicate 'EXPIRE foo -1' as is and a slave with this new version will
not delete the key when it sees that. On the other hand, similar things
can already happen due to the bug we want to fix... We could create a
workaround for this, but I'm not sure it's worth it. What do you think?
> Ok I think this is the full lists of issues and fixes we need.
>
> Comments welcomed.
>
I think that's everything too. Good that you spotted problem #3 above.
Nice summary of it all too. Please check my update when you have time. I
may still have missed something, but otherwise this could be it.
My updated branch: https://github.com/hampus/redis/commits/fixaofexpire
Regards,
Hampus
Here's a new branch with a better commit message and a rebase:
https://github.com/hampus/redis/commits/fixexpire
Feel free to solve it differently, of course. This is just a suggestion.
Cheers,
Hampus
Hey Hampus!
> Actually, the else-block will set an expiration on the key with a time
> in the past so the key should be deleted later on in serverCron() or
> expireIfNeeded(), whichever gets there first. The latter should make
> sure that these expired keys are never returned to a client. I think
> these functions all work, even in possibly quite extreme cases (the
> expiration time could be several days in the past here!).
You are right, thanks I overlooked the "fall through" :)
>> To fix both this issues an EXPIRE with -1 and an EXPIREAT in the past
>> should be both replicated either as DEL or as EXPIRE -1.
>
> This is definitely a problem, both for the AOF and replication.
> Expirations are already replicated and logged as DELs in other cases, so
> the best thing would probably be to do so here too.
>
> I updated my branch with a small fix. It calls propagateExpire() instead
> of replicating the expiration itself. That should send a DEL to both the
> AOF and slaves (the commit msg only mentions replication, but I've
> already pushed it...). I also added one more check so that slaves never
> delete a key when they see an EXPIRE/EXPIREAT command with an expiration
> time in the past, as that could still easily happen when doing EXPIREAT
> one second into the future or similar. The slave will set an expiration
> instead and wait for a DEL from the master as usual.
Great! Sounds like a sensible fix, possibly we can refactor this later
if needed but for now sounds like a good fix.
> A problem with this last fix is that it won't be completely compatible
> with earlier versions. If the master runs an earlier version it will
> replicate 'EXPIRE foo -1' as is and a slave with this new version will
> not delete the key when it sees that. On the other hand, similar things
> can already happen due to the bug we want to fix... We could create a
> workaround for this, but I'm not sure it's worth it. What do you think?
I think it's fine... this is a really subtle incompatibility and the
old behavior was bug-driven, I don't think there is a single user
relaying on such a "feature" :)
> I think that's everything too. Good that you spotted problem #3 above.
> Nice summary of it all too. Please check my update when you have time. I
> may still have missed something, but otherwise this could be it.
>
> My updated branch: https://github.com/hampus/redis/commits/fixaofexpire
Awesome, I'm going to check the code and very probably I'll merge that.
Will reply with a follow up in a few hours at max.
Thank you,
Salvatore
> Regards,
Great work, I see only a small problem:
- if (seconds <= 0 && !server.loading) {
520
- if (dbDelete(c->db,key)) server.dirty++;
519
+ /* Never delete key immediately when loading AOF or if this is a slave */
520
+ if (seconds <= 0 && !server.loading && !server.masterhost) {
521
+ if (dbDelete(c->db,key)) propagateExpire(c->db,key);
Here we are no longer incrementing the dirty counter so actually the
"save" will not trigger assuming you have only changes of this
(strange) kind. Not a big problem... but in general this shows us how
we are no longer ok with the current semantics in the internals.
We should have two separated counters, one is dirty, and one is
server.propagate (or alike).
This way we can have a simple function that we call when the intention
is to propagate the command to AOF/Replication link like:
propagateCommand(PROPAGATE);
while dirty stuff is incremented only for the sake of triggering the
save operation.
We can do that into unstable the first time there is a chance for a
refactoring (possibly after Redis 3.0 we should stop and just do
that).
Otherwise the patch is fine, but in order to fix this small problem
maybe it's worth calling the rewriteClientCommandVector() function
instead?
Cheers,
Salvatore
> Cheers,
True. That was something I hadn't thought about at all ;) Not a big
problem, but still unlogical if expire commands don't count here!
> We should have two separated counters, one is dirty, and one is
> server.propagate (or alike).
> This way we can have a simple function that we call when the intention
> is to propagate the command to AOF/Replication link like:
>
> propagateCommand(PROPAGATE);
>
> while dirty stuff is incremented only for the sake of triggering the
> save operation.
>
> We can do that into unstable the first time there is a chance for a
> refactoring (possibly after Redis 3.0 we should stop and just do
> that).
I agree. Something like that would make it clearer.
> Otherwise the patch is fine, but in order to fix this small problem
> maybe it's worth calling the rewriteClientCommandVector() function
> instead?
Or we simply treat expires the same regardless of whether they expire
instantly or not, i.e. we remove the if-statement and always set an
expiration instead (the else-statement, right now). I don't think it is
that common to issue expire/expireat with a negative TTL and in that
case we shouldn't lose much by doing it this way, as far as I can tell.
The expired keys will be around in memory for a while, but it shouldn't
have any other effects.
What do you think? That's probably the easiest way. Otherwise
rewriteClientCommandVector() looks like the way to go.
Hampus
As I said yesterday to Pieter during an IRC chat that would be the way
to go, it was a design error to allow negative expires to behave as
deletes, but this is a too big semantical change at this point I
think... so in theory I like the idea but I think it's better to take
the semantics unchanged for now. Maybe we can break this later with
the 3.0 release, but in such a case I would emit an error for EXPIRE
with a TTL < 0 too... and just use the current time if a timestamp is
in the past in an EXPIREAT call.
You can either update the branch or I can merge and add a commit to
change this, as you wish.
Thanks!
Salvatore
Yes, but I would expect expireIfNeeded() to delete the key later anyway,
before the client ever notices the difference (on the first access, as
its expiration time is already passed). Possibly it isn't guaranteed to
work like that, but I think it will. Or?
> You can either update the branch or I can merge and add a commit to
> change this, as you wish.
Probably easier that you add a commit to fix this last thing. Thanks for
merging!
Cheers,
Hampus
Since this is obtained with lazy expiring the main difference would be that:
1) DBSIZE will return the key in the count. Not a big issue.
2) If in the master you dont' touch the key nor it gets sampled soon
enough in the slave you can see the key as existing. This already
happens with EXPIRE but is something we probably have to live with,
but if this happens in a command that is semantically a delete is a
bigger issue.
>> You can either update the branch or I can merge and add a commit to
>> change this, as you wish.
>
> Probably easier that you add a commit to fix this last thing. Thanks for
> merging!
Perfect, thanks!
Yeah, that's true. Good point. #2 could happen after loading the AOF
too. Probably less problematic there, but we could go through all
expired keys after we finish loading and delete them. Would be a nice
improvement, but requires some more code (I think) so probably better to
wait with that.
Great. I merged the first two commits in your branch and added the
third commit with the change we talked about plus another one.
Pieter, Hampus, all the changes look ok for you as well? If so I'm
going to backport into 2.2 / 2.4 everything.
The problem of the slave not being able to expire things for reasons
of consistency (basically that it is required to wait for the DEL from
the master) is a real one and needs to be addressed at some point in
the future but will be pretty invasive.
In the review I came across a minor thing that doesn't impact behavior
but was suboptimal, patched here:
https://github.com/pietern/redis/tree/unstable-expire-loading
Cheers,
Pieter
I did find one problem, but apart from that I can't either think of any
way this wouldn't work now.
The problem I found is that call() in redis.c keeps a reference to the
original command and passes it on to feedAppendOnlyFile(). When we
replace an EXPIRE with a DEL, feedAppendOnlyFile() still believes it is
an EXPIRE and tries to convert it into an EXPIREAT. This doesn't work as
the arguments have been changed (it crashes).
Not sure how to best fix that. We should probably check argv[0] in
feedAppendOnlyFile() instead of cmd. Slightly slower, but should work.
I think this one deserves a test case to. It happens if you run SET foo
bar, EXPIRE foo -1 with AOF enabled.
Getting late here, so I won't write a patch now. Better that you have a
look at it tomorrow, Salvatore. I can write a fix and/or test case
tomorrow if you want. Only if it helps.
BR,
Hampus
Hampus: yhanks for spotting this! I'm adding a test, trying to fix it
the proper way.
I'll take all you posted guys.
Salvatore
This is what I think should be the direction for unstable/2.4,
probably its' not a good idea for 2.2 where we need a less "involved"
patch:
https://github.com/antirez/redis/commit/09e2d9eeba3ff65fd60f905a5bcb0f684f7a883e
Cheers,
Salvatore