Issue 150 in redis: Copy on write does not seem to work.

re...@googlecode.com

unread,

Jan 27, 2010, 2:10:39 PM1/27/10

to redi...@googlegroups.com

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 150 by gshfreesw: Copy on write does not seem to work.
http://code.google.com/p/redis/issues/detail?id=150

What steps will reproduce the problem?

1.Insert varying small amount (1k) of String data until it is little over
50% of RAM. So if you have a 4GB of RAM, the redis-server should be over
2.5GB of RAM.

2. Now issue a BGSAVE. You will the Child process is allocated 2.5 GB as
well and the total memory allocated is now 5GB and some of the parent
process data is now swapped and meanwhile the BGSAVE takes a long time to
complete.

3. I do have the overcommit memory setting and am running on Ubuntu Hardy
box.

What is the expected output?
I expected the COW of fork would copy only the address space of the parent
process.

What do you see instead?

I see the child process is allocated the same memory as the parent process.

What version of the product are you using? On what operating system?

using 1.2 on Ubuntu Hardy.

Please provide any additional information below.

We love Redis but this BGSAVE memory allocation issue becomes a moot issue
for us.

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

re...@googlecode.com

unread,

Jan 28, 2010, 3:18:52 PM1/28/10

to redi...@googlegroups.com

Comment #1 on issue 150 by gshfreesw: Copy on write does not seem to work.
http://code.google.com/p/redis/issues/detail?id=150

Can someone please look into this? This is affecting our deployment.

Thanks!.

re...@googlecode.com

unread,

Feb 8, 2010, 2:04:46 AM2/8/10

to redi...@googlegroups.com

Comment #2 on issue 150 by vaticide: Copy on write does not seem to work.
http://code.google.com/p/redis/issues/detail?id=150

We see this as well. When the BGSAVE process kicks off, the main process
goes into
swap and performance suffers. We've seen this both on Debian and Solaris.

re...@googlecode.com

unread,

Feb 8, 2010, 2:11:51 AM2/8/10

to redi...@googlegroups.com

Comment #3 on issue 150 by lukemelia.com: Copy on write does not seem to
work.
http://code.google.com/p/redis/issues/detail?id=150

We were seeing this as well on EngineYard's GenToo build and moved to AOF
as a result.

re...@googlecode.com

unread,

Feb 8, 2010, 5:23:42 AM2/8/10

to redi...@googlegroups.com

Comment #4 on issue 150 by antirez: Copy on write does not seem to work.
http://code.google.com/p/redis/issues/detail?id=150

Hello. Basically the problem experienced is actually a sum of different
problems:

First of all, you can't trust the memory reported by Redis. Redis is able
to tell you the
number of bytes it requested to malloc(), but can't tell the malloc
overhead,
fragmentation, and so forth. So depending on the kind of keys Redis will
report
maybe just the 60% of the memory it is *really* using.

The only way to check the real memory usage is via 'ps'.

So:

1) Most of you are nearly out of memory when BGSAVE is called, and the OS
indeed
will swap, as copy on write still allocates some memory for the new
process, and as
the server is active a given percentage of pages get copied.
2) Solaris DOES NOT implement Copy on Write. Don't use Redis with Solaris
if it's not
just a cache, without persistence required.
3) If you are nearly out of memory, AOF will hardly save you. From time to
time you
need to perform BGREWRITEAOF in order to avoid that the log file gets too
big, and
this command uses copy on write as well. But of course you can call it just
one time
at night.

SOLUTIONS

1) With Redis 2.0.0 (or Redis master on Git) we have Virtual Memory. If you
are out of
memory because your values are large this will be a new world for you. If
you are out
of memory because you have too much keys, this will not help so much as
Redis VM
can swap values, not keys.
2) If you use one-field-per key encoding of your objects, switch to
serialized objects
if possible. This will require a lot less memory compared to what you are
using
today.
3) Make sure to switch to Redis >= 1.2.x if you are using a lot of integers
as values.
This will save you up to 35% of memory.
4) Redis 2.0.0 will implement a new Hash data type that is *very* memory
efficient.
This can help as well.
5) If you can't use any of the above, you need more RAM or to split your
dataset
across different Redis instances.

But here the rule is, if you see strange stuff on BGSAVE, you are 99.99% of
times
simply out of memory.

re...@googlecode.com

unread,

Feb 11, 2010, 7:41:13 AM2/11/10

to redi...@googlegroups.com

Comment #5 on issue 150 by gshfreesw: Copy on write does not seem to work.
http://code.google.com/p/redis/issues/detail?id=150

Hi,

1. When I checking for memory usage. I am using top and not using redis.

2. I am not nearly out of memory when this happens. On a 16GB machine, I
have the
redis process take 11G (as reported by top) and top shows 5GB free memory
(after
flushing the disk cache etc). Here COW should not take more than 5GB to
allocate
child processes address space. so that's why I claim COW does not work as
expectected.

We finally solved the issue by sharding into ten redis processes and each
redis
server takes about 1.6GB and saving is not a issue since the child process
is
allocated only 1.6GB when saving. We have disabled saves in the config file
and we
issue bgsave commands issuing a cron thereby doing save on one server at a
time.

Anyway, on the COW we need to check with the Linux's kernel team here as it
might be
a bug in the Linux kernel (memory is so cheap, may be people have not
noticed it at all).

re...@googlecode.com

unread,

Feb 28, 2010, 12:41:03 PM2/28/10

to redi...@googlegroups.com

Comment #6 on issue 150 by didier.06: Copy on write does not seem to work.
http://code.google.com/p/redis/issues/detail?id=150

Hi,

some remarks:

1. Recent versions of Solaris do indeed implement COW. It can be easily
demonstrated
using a small forking program and the "pmap -xs pid1 pid2" command. See the
attached
source file for an example.

2. However, Solaris memory overcommit is way stricter than for Linux, and
you
probably still need twice the redis memory in the swap space to be safe on
Solaris.
See an explanation at:
http://developers.sun.com/solaris/articles/subprocess/subprocess.html

3. On any modern Unix/Linux system, it is difficult to really evaluate the
memory
consumption. Especially, tools like ps, top, htop, prstat, etc .. often do
not take
in account shared memory very well. I usually rely on the memory map of the
processes
(pmap), it gives a much clearer picture.

4. Not all memory is subject to COW. Especially, the page translation
tables needed
for the virtual memory mechanism are not. On typical 64 bits Intel/AMD
hardware page
size is 4096 bytes, and entries in the translation tables are on 8 bytes,
so 16 Gb of
virtual memory actually require at least 32 Mb of memory which will never
be shared
across processes (with typical Linux kernel).

5. On top of that, of course, the more write operations sent to redis
during the
dump, the more extra memory will be consumed, so it also depends on your
traffic. You
need to take it in account when you size your memory.

6. I don't think there is a bug in the Linux kernel regarding COW,
otherwise there
would be plenty of problems with large multi-process applications such as
Oracle or
postgreSQL.

7. Despite a good COW implementation, Linux can indeed start swapping when
a dump is
generated. This is not because of a shortage of memory, but because Linux
tends to be
too aggressive to use the filesystem cache. Basically, the OS has to find a
balance
between memory allocated for processes and the filesystem cache. When you
write very
big files, it put pressure on the filesystem cache at the expense of memory
allocated
to processes. So the system starts swapping. You can mitigate this problem
by setting
a low value in the vm.swappiness system parameter. However, it does not
completely
prevent the problem to happen. This is a known problem, experienced by all
people
trying to run large I/O bound databases on Linux. The only way to alleviate
the
problem would be to use O_DIRECT I/Os to generate the dump, but this
requires a code
change in redis (it would be an interesting option to add). The dump would
be slower,
but no swapping would occur provided the on-going traffic does not alter
too much
redis memory.

8. Putting several redis instances on the same box can be dangerous: if you
are not
lucky, all the dumps will occur at the same time, and you will have the
same exact
situation as before. A semaphore to limit the number of concurrent dumps
across
different redis instances would be another useful option for people working
on
multi-core large boxes.

Hope this helps ...

Regards,
Didier.

Attachments:
toto.c 1017 bytes

re...@googlecode.com

unread,

Feb 28, 2010, 1:06:33 PM2/28/10

to redi...@googlegroups.com

Comment #7 on issue 150 by antirez: Copy on write does not seem to work.
http://code.google.com/p/redis/issues/detail?id=150

Hello Didier! thanks a lot for your comment, it's full of useful
information.

A few notes about your different points:

1) Glad to hear that recent Solaris implements this feature. Without this
Solaris and
Redis were really not a great fit.

2) Ok, worth to add in the documentation probably.

3) I agree, I think that there is no way ps or top can take into account
shared pages
for instance.

4) Indeed, but this is very little memory. The time to copy this on fork()
is many times
acceptable. When there are real-time-alike requirements the only option is
to setup
a "saving slave" and disable saving in the master I guess. Also the append
only file
setup helps a lot about this. One can rewrite the append only file just
every hour or
more. It depends a lot on the read/write ratio.

5) Yes, I stress this often, but if you do some math, even with very high
traffic sites
having for instance 10,000k queries second, it's hard to touch many pages,
since
often this operations will happen to touch a big percentage of common
pages. Still
indeed this is something to consider.

6) I don't think either that Linux COW is bugged.

7) O_DIRECT can be interesting, but currently with systems almost out of
memory
saving every few minutes we still were unable to see such a problem in the
practice.
For instance superfeedr is running N Redis systems with this setup without
experiencing problems. So maybe the buffers are still a not so big amount of
memory compared to the rest?

8) O_DIRECT can be an interesting idea. There is another option, that is to
disable
auto-saving at all and write a little daemon to issue the SAVE command in
the
different instances. Btw apart from the buffers allocated when there are
intensive I/O
operations there is also the problem with having multiple processes doing a
lot of
disk I/O at the same time, this can slow down things considerably.

It will be very interesting to evaluate how VM and Redis play together in
extreme
conditions, but so far the experience has showed that this is working very
well. Still
for very read-intensive applications were writes are comparably rare, I
suggest to
use the AOF setup.

Thanks again,
Salvatore

re...@googlecode.com

unread,

Aug 24, 2010, 6:02:23 AM8/24/10

to redi...@googlegroups.com

Updates:
Status: WontFix

Comment #8 on issue 150 by antirez: Copy on write does not seem to work.
http://code.google.com/p/redis/issues/detail?id=150

(No comment was entered for this change.)

Reply all

Reply to author

Forward