recently written data to dump.rdb is deleted when redis-server starts

Kyle Bragger

unread,

Feb 12, 2011, 10:37:40 AM2/12/11

to Redis DB

Hi all,

We are using Redis at Forrst.com for a bevy of things including
caching, stats, etc. We have recently run into a recurring issue where
redis has been dying (has happened 3 times now) when looking at large
datasets using redis-cli. The issue that causes is this: when redis
starts up again, our dump.rdb has had days of data deleted, even
though at most per our redis.conf we should have flushes to disk every
15 minutes max. Is there something I'm missing wrt how disk writes
work, and how redis reads back data on startup?

Thanks
Kyle

Michel Martens

unread,

Feb 12, 2011, 10:44:32 AM2/12/11

to redi...@googlegroups.com

Hey,

Can you show the config you are using?

Kyle Bragger

unread,

Feb 12, 2011, 10:46:27 AM2/12/11

to Redis DB

Ah yes, sorry. http://pastie.org/private/gjmok6ablj7i6g0lfv77w

On Feb 12, 10:44 am, Michel Martens <mic...@soveran.com> wrote:
> Hey,
>

Didier Spezia

unread,

Feb 12, 2011, 10:56:01 AM2/12/11

to Redis DB

Hi,

did you check the available space on the file system you put
the dump file on? You need at least twice the size of the
dump. Did you get errors in the redis log file about the
dumps?

If you have a space issue, and if you are extremely unlucky,
there is a risk the dump file is corrupted. See issue 417.

http://code.google.com/p/redis/issues/detail?id=417

Regards,
Didier.

On Feb 12, 4:46 pm, Kyle Bragger <kyle.brag...@gmail.com> wrote:
> Ah yes, sorry.http://pastie.org/private/gjmok6ablj7i6g0lfv77w

Kyle Bragger

unread,

Feb 12, 2011, 11:06:27 AM2/12/11

to Redis DB

Hi Didier,

Thanks for the reply -- the dumpfile is < 600MB, and our disks are
huge. redis-check-dump shows no errors, and nothing questionable in
logs either.

Kyle

Pieter Noordhuis

unread,

Feb 13, 2011, 4:41:02 AM2/13/11

to redi...@googlegroups.com

Hi Kyle,

So in short, there are two issues:
a) Redis dies when looking at large data sets with redis-cli
b) Data that should be saved isn't saved

From your email I'm not sure if a) is an issue or not, but I wanted to
put it out here to see if this is something we could resolve. As for
b) (which I believe is your "main" issue), could you post a snippet of
the log file of about 15 minutes, so we can see if there are any
anomalies in the interval where Redis should spawn a child to save the
dump.

Didier already hinted to this, but are you sure this is not a
filesystem issue? E.g. when you backup the rdb you change its
permissions/owner which in turn causes Redis not being able to
atomically switch the temporary dump file to the target name.

Would love to hear some more details along this line so we can resolve this.

Thanks,
Pieter

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

Salvatore Sanfilippo

unread,

Feb 13, 2011, 10:21:26 AM2/13/11

to redi...@googlegroups.com

Hello,

as Pieter suggested this are actually two different issues, possibly
due to the same reason, but let's analyze the two things separately.
Let's start with a few questions that will provide some more data:

INFO output while the instance is running normally. So that we can
know the exact Redis version, the dataset size and in general the
state of the instance.

Now on the two issuses.

1) Crash. You run with VM disabled. It is believed that at least in
2.0 there is some obscure bug related to replication and VM
interacting in some bad way. We believe that the issue is probably NOT
present in 2.2.
But as you run without VM enabled, we are not aware of any stability
problem at all in Redis 2.0 and 2.2.

When Redis crashes it is able to report the problem and even prints
the stack trace on the Log file.
Can you find it? Otherwise, it is possible that your Redis instance
was killed by the Linux OOM killer? You can see this in the kernel
log. So we definitely need more help about this from you, as the
current information is very little.

It is also possible to run Redis with gdb without any performance
penalty so that on crash there the possibility to ask gdb for a stack
trace and do some debugging but let's take this for the future. For
now the log of the Linux Kernel and Redis should provide enough info.

2) The problem with persistence.

Redis is a very simple system, if you understand how it works it will
be simple to understand what's wrong with it. So how persistence
works?

- Redis forks.
- The child starts saving the dataset as a temporary file name.
- When the new file is ready, it is renamed as dump.rdb, atomically,
using the rename(2) system call.

So basically if you run "ls -l dump.rdb" before and after Redis saves
on disk you can clearly that the creation date of the file changed.
Example:

$ ls -l dump.rdb
-rw-r--r-- 1 antirez staff 99313 11 Feb 15:11 dump.rdb
$ ls -l dump.rdb
-rw-r--r-- 1 antirez staff 99313 13 Feb 16:17 dump.rdb
$

If it does not change after a "BGSAVE" command, then there is
something wrong. But if there is something wrong you can see it from
the log file.

So if while you are doing a tail -f /path/to/redis_log_file

You also issue redis-cli BGSAVE

You should see "live" what happens.

Another reason why the persistence is not working is that, for some
reason, the parent process is trying to save but for some reason it
stopped working but it is still alive. So since the previous BGSAVE
never ended, the new can't start. I've not idea why this could happen
usually with sane hardware and local disks (so no NFS or alike), but
well to figure if this is the problem is trivial. One can just ask
INFO and check if the background save is in progress.

Also with ps it is possible to check the pid of the background saving
process and run an "strace -p <pid>" against it, to check what it is
doing, and why it is blocked.

Finally, are you running Redis natively or via some tool like "runit"
or other tools related to logging or restarting Redis automatically?

Ok I think this should be enough to start investigating.

Cheers,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Kyle Bragger

unread,

Feb 13, 2011, 11:56:18 AM2/13/11

to Redis DB

Hi Salvatore,

Cheers for such a thorough response!

After reading your post, and doing some more digging I realized that
BGSAVE has been failing:

[14325] 13 Feb 08:52:35 * 10 changes in 300 seconds. Saving...
[14325] 13 Feb 08:52:35 # Can't save in background: fork: Cannot
allocate memory

SAVE on the other hand works as I would expect. I just took the site
down for a moment to run it. That would explain why we've been losing
so much data, though strange on the file timestamp. Either way, good
to know we made some progress.

I will dig thru the kernel logs re: the possible OOM dying, thought it
would make sense given that it seems to die when large datasets are
piped back.

Any thoughts on the BGSAVE OOM issue?

We are running redis with a simple init.d script that just calls redis-
server /path/to/redis.conf

Thanks again.

Kyle

> > For more options, visit this group athttp://groups.google.com/group/redis-db?hl=en.

Didier Spezia

unread,

Feb 13, 2011, 12:15:46 PM2/13/11

to Redis DB

If you think you should have enough memory (including swap) for
Redis and all the other processes running on the machine,
you may want to check the overcommit configuration.

On Linux, you need to set overcommit_memory to 1.
More information in the FAQ:
http://redis.io/topics/faq

Regards,
Didier.

Kyle Bragger

unread,

Feb 15, 2011, 6:23:35 PM2/15/11

to Redis DB

So it turns out it was overcommit_memory the whole time. Doh.

We are going to get that fixed up. Thanks /so/ much for all the
assistance.

Now, for a new but related question: are there best practices for
scaling horizontally across multiple servers?

Kyle

Reply all

Reply to author

Forward