Memory required for background save

9385 views
Skip to first unread message

Michal Frackowiak

unread,
May 1, 2009, 7:45:47 PM5/1/09
to Redis DB
Hi,

when discussing Issue 30 Salvatore mentioned some details about the
background saving - http://code.google.com/p/redis/issues/detail?id=30&can=1#c8
:

> About memory usage when bgsaving, basically it uses fork() that implements a
> copy-on-write semantic of memory pages. This means that the memory really consumed is
> proportional to the number of changes the dataset will get while a bgsave is in
> progress. If a little number of keys are changing bgsave will consume little
> additional memory, in the other hand if all the keys will change while a large bgsave
> is running the memory used will be the same amount of RAM as the main process.

My situation is:
. DB 0: 6480585 keys (0 volatile) in 8388608 slots HT.
. 34 clients connected (0 slaves), 2951156568 bytes in use

MemTotal: 7872040 kB
MemFree: 2544592 kB
Buffers: 75716 kB
Cached: 591420 kB
SwapCached: 0 kB
Active: 4395280 kB
Inactive: 639972 kB
SwapTotal: 0 kB
SwapFree: 0 kB
....

The problem is that although data is just 3 GB and the machine has ~
2.5 GB free, periodical bgsave produces the error:

- 1 changes in 1800 seconds. Saving...
* Can't save in background: fork: Cannot allocate memory

The second process is not even starting. I guess this is not a desired
behaviour, or at least not really optimal. Or something is going
wrong?

The system is Ubuntu 9.04, 64bit on a EC2 large instance, using the
latest Redis from git repo.

Any clue on this one?

Michal

Salvatore Sanfilippo

unread,
May 3, 2009, 9:06:06 AM5/3/09
to redi...@googlegroups.com
On Sat, May 2, 2009 at 1:45 AM, Michal Frackowiak <redb...@openlabs.pl> wrote:

> The problem is that although data is just 3 GB and the machine has ~
> 2.5 GB free, periodical bgsave produces the error:

Hello Michal!

From the Linux fork man page:

> Under Linux, fork() is implemented using copy-on-write
> pages, so the only penalty that it incurs is the time and
> memory required to duplicate the parent's page tables,
> and to create a unique task structure for the child.

And later about ENOMEM error of fork:

ENOMEM
fork() failed to allocate the necessary kernel structures because
memory is tight.

Basically in moder operating systems fork will just copy the process
structure, and the mapping between address space and physical memory
pages. This should not fail unless the system is almost out of memory.

What you noticed is not a sane behavior and looks a lot like EC2
virtualization technology failure...

Btw under mac os x I tried some day ago to load a dataset as large as
memory, BGSAVE worked without any problem. Will try the same against a
Linux box in the next days but should work without problems.

Cheers,
Salvatore

>
> - 1 changes in 1800 seconds. Saving...
> * Can't save in background: fork: Cannot allocate memory
>
> The second process is not even starting. I guess this is not a desired
> behaviour, or at least not really optimal. Or something is going
> wrong?
>
> The system is Ubuntu 9.04, 64bit on a EC2 large instance, using the
> latest Redis from git repo.
>
> Any clue on this one?
>
> Michal
> >
>



--
Salvatore 'antirez' Sanfilippo
http://invece.org

Michal Frackowiak

unread,
May 4, 2009, 2:27:56 AM5/4/09
to Redis DB
On May 3, 3:06 pm, Salvatore Sanfilippo <anti...@gmail.com> wrote:

> Basically in moder operating systems fork will just copy the process
> structure, and the mapping between address space and physical memory
> pages. This should not fail unless the system is almost out of memory.
>
> What you noticed is not a sane behavior and looks a lot like EC2
> virtualization technology failure...
>
> Btw under mac os x I tried some day ago to load a dataset as large as
> memory, BGSAVE worked without any problem. Will try the same against a
> Linux box in the next days but should work without problems.

Hmm... EC2 being the cause is what I initially suspected too, but I
could not find any known memory problem with fork. If EC2
virtualization is the problem, it means this is bad.

I will try other Linux distros on EC2. So far the problem appeared on
small (32bit) and large (64bit) boxes, running Ubuntu 9.04 images from
http://alestic.com (ami-bf5eb9d6 and ami-bc5eb9d5). It is a new distro
and it _could_ be causing problems...

I will keep you informed - could save others from some headache.

Michal

Salvatore Sanfilippo

unread,
May 4, 2009, 7:43:07 AM5/4/09
to redi...@googlegroups.com
On Mon, May 4, 2009 at 8:27 AM, Michal Frackowiak <redb...@openlabs.pl> wrote:

> Hmm... EC2 being the cause is what I initially suspected too, but I
> could not find any known memory problem with fork. If EC2
> virtualization is the problem, it means this is bad.
>
> I will try other Linux distros on EC2. So far the problem appeared on
> small (32bit) and large (64bit) boxes, running Ubuntu 9.04 images from
> http://alestic.com (ami-bf5eb9d6 and ami-bc5eb9d5). It is a new distro
> and it _could_ be causing problems...
>
> I will keep you informed - could save others from some headache.

Ok further investigations showed that probably this is due to
/proc/sys/vm/overcommit_memory
If it's zero in your system try to set it to 1 or 2 and check what
happens. I'm going to try this in few hours.

Cheers,
Salvatore

Michal Frackowiak

unread,
May 4, 2009, 8:31:34 AM5/4/09
to Redis DB
> Ok further investigations showed that probably this is due to
> /proc/sys/vm/overcommit_memory
> If it's zero in your system try to set it to 1 or 2 and check what
> happens. I'm going to try this in few hours.

echo 1 > /proc/sys/vm/overcommit_memory

works perfectly! So the problem was with the kernel _estimating_ how
much memory would the forked process need. Echoing "1" as I understand
disables the check and enables the process to fork.

Since "0" is default of overcommit_memory, perhaps the issue is much
more common on Linux boxes. It also looks like MacOSX is free of this
issue.

If confirmed, it would be nice to have it added to the FAQ.

Great job and many thanks again

Michal
http://michalfrackowiak.com

Salvatore Sanfilippo

unread,
May 4, 2009, 10:00:05 AM5/4/09
to redi...@googlegroups.com
On Mon, May 4, 2009 at 2:31 PM, Michal Frackowiak <redb...@openlabs.pl> wrote:
>
>> Ok further investigations showed that probably this is due to
>> /proc/sys/vm/overcommit_memory
>> If it's zero in your system try to set it to 1 or 2 and check what
>> happens. I'm going to try this in few hours.
>
> echo 1 > /proc/sys/vm/overcommit_memory
>
> works perfectly! So the problem was with the kernel _estimating_ how
> much memory would the forked process need. Echoing "1" as I understand
> disables the check and enables the process to fork.

Exactly, I'm going to do two changes in Redis about this:

1) If compiled for Linux and overcommit_memory is zero when the server
starts it logs a warning: "/proc/sys/vm/overcommit_memory is set to
zero, background saving may fail, please add echo 1 > ... in your
startup scripts".

2) When a bgsave is in progress Redis will not try to resize the hash
table to maintain the right ratio between used and empty buckets: this
memory "movements" can result in the copy-on-write of a lot of memory
pages.

This is also why it's not a good idea what in theory would be one: to
remove the keys in the background process while saving in order to
free memory: most of the times to remove keys will instead result in
copy-on-write of pages.

> Since "0" is default of overcommit_memory, perhaps the issue is much
> more common on Linux boxes. It also looks like MacOSX is free of this
> issue.

Yep seems like Mac OS X is more optimistic by default :)

> If confirmed, it would be nice to have it added to the FAQ.

Sure I'm going to add this.

> Great job and many thanks again

Thanks to you!

Ciao,
Salvatore

> Michal
> http://michalfrackowiak.com

Salvatore Sanfilippo

unread,
May 4, 2009, 10:26:50 AM5/4/09
to redi...@googlegroups.com
On Mon, May 4, 2009 at 4:00 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
Exactly, I'm going to do two changes in Redis about this:
>
> 1) If compiled for Linux and overcommit_memory is zero when the server
> starts it logs a warning: "/proc/sys/vm/overcommit_memory is set to
> zero, background saving may fail, please add echo 1 > ... in your
> startup scripts".
>
> 2) When a bgsave is in progress Redis will not try to resize the hash
> table to maintain the right ratio between used and empty buckets: this
> memory "movements" can result in the copy-on-write of a lot of memory
> pages.

Both changes just pushed on Git.

Cheers,
Salvatore

Brenden Grace

unread,
May 4, 2009, 2:17:24 PM5/4/09
to redi...@googlegroups.com
On Mon, May 4, 2009 at 10:26 AM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
>
> On Mon, May 4, 2009 at 4:00 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> Exactly, I'm going to do two changes in Redis about this:
>>
>> 1) If compiled for Linux and overcommit_memory is zero when the server
>> starts it logs a warning: "/proc/sys/vm/overcommit_memory is set to
>> zero, background saving may fail, please add echo 1 > ... in your
>> startup scripts".
>>
>> 2) When a bgsave is in progress Redis will not try to resize the hash
>> table to maintain the right ratio between used and empty buckets: this
>> memory "movements" can result in the copy-on-write of a lot of memory
>> pages.
>
> Both changes just pushed on Git.

Perfect timing. I just ran into these issues over the weekend. The
overcommit_memory setting allowed me to push the limits of the RAM on
the system (64G):

INFO:

{:uptime_in_seconds=>"11743",
:changes_since_last_save=>"0",
:uptime_in_days=>"0",
:bgsave_in_progress=>"0",
:redis_version=>"0.100",
:last_save_time=>"1241459212",
:connected_clients=>"1",
:total_connections_received=>"8",
:connected_slaves=>"0",
:total_commands_processed=>"320230613",
:used_memory=>"50910391075"}


1. used_memory is only report 47.41G, but my redis process seems to be
more like 62G. The system's swap is completely full and nothing else
is really taking up any RAM, so I wonder what exactly used_memory is a
sum of?

2. Obviously when you hit the swap performance goes way down. Five
clients inserting records as fast as they could stayed pretty constant
at about 10k inserts in 1.5 seconds each ... when Redis started to
swap that same 10k went to 5 seconds each, then 8, then 16, then 21
and on and on.

3. Redis was able to write out the dump even when it was hitting the
swap space. It wasn't pretty but it wrote it out even with 6 of the 16
cores completely pegged and all available RAM and swap allocated.

4. Oddly, my dump.rdb is only 4.8G for the above redis process. I do
NOT have sharedobjects set to yes, and so this value is much smaller
than I would have expected. The keys are variable in size but the data
is the same two bytes over and over. Is this the result of
compression?

5. Is there anyway the INFO command could list the numbers of keys or
"objects" in the system? This would be very useful. I think we
inserted ~317 million.

6. We would love the maxmemory setting. Any thoughts on when this
might be added?

Looking very promising Salvatore!

--
Brenden C Grace

Salvatore Sanfilippo

unread,
May 4, 2009, 2:35:33 PM5/4/09
to redi...@googlegroups.com
On Mon, May 4, 2009 at 8:17 PM, Brenden Grace <brende...@gmail.com> wrote:

Hello Brenden,

> 1. used_memory is only report 47.41G, but my redis process seems to be
> more like 62G. The system's swap is completely full and nothing else
> is really taking up any RAM, so I wonder what exactly used_memory is a
> sum of?

It's the sum of all the allocated memory, but internally malloc() is
wasting memory by fragmentation and storing information about the
chunks of allocated memory.

> 2. Obviously when you hit the swap performance goes way down. Five
> clients inserting records as fast as they could stayed pretty constant
> at about 10k inserts in 1.5 seconds each ... when Redis started to
> swap that same 10k went to 5 seconds each, then 8, then 16, then 21
> and on and on.

I expected even worse of this... I wonder what the "curve" of
performance degradation is. I mean, to reach 21 seconds it was needed
to add just a few keys more or, for example, 20% of keys more?

> 3. Redis was able to write out the dump even when it was hitting the
> swap space. It wasn't pretty but it wrote it out even with 6 of the 16
> cores completely pegged and all available RAM and swap allocated.

Nice

> 4. Oddly, my dump.rdb is only 4.8G for the above redis process. I do
> NOT have sharedobjects set to yes, and so this value is much smaller
> than I would have expected. The keys are variable in size but the data
> is the same two bytes over and over. Is this the result of
> compression?

Yep this is probably the result of compression if the values had a lot
of redundancy. Note that Redis is using both string values
compressions and integer encoding of keys and values can look like an
integer and can be reconstructed bit-by-bit as the original string
saving by their integer representation. So for example "2112938487"
will only take four bytes in the DB dump, but instead " 2112938487"
will be stored as a string since there is a space before the number.

> 5. Is there anyway the INFO command could list the numbers of keys or
> "objects" in the system? This would be very useful. I think we
> inserted ~317 million.

Sure, I'll add it in seconds.
317 millions looks cool :)

> 6. We would love the maxmemory setting. Any thoughts on when this
> might be added?

I'll work in this tomorrow. Just to make sure it makes sense, this is
how I want to implement it:

When maxmemory is reached:

- The server replies to -ERR to every "write" command received, but
DEL that will continue to work
- every second will try to get space trying to expire volatile keys
(keys with timeouts associated). The algorithm will be something like
this:

while(memory_used >= maxmemory) {
Get five keys with timeout assoicated at random
Select the key that has the shorter time to live
Delete this key
}

- every second will try to get space trying to remove elements from
the Redis objects free list cache.

Does this look sane?

> Looking very promising Salvatore!

Thanks! your testing is impressive.

Brenden Grace

unread,
May 4, 2009, 3:19:45 PM5/4/09
to redi...@googlegroups.com
On Mon, May 4, 2009 at 2:35 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
>> 4. Oddly, my dump.rdb is only 4.8G for the above redis process. I do
>> NOT have sharedobjects set to yes, and so this value is much smaller
>> than I would have expected. The keys are variable in size but the data
>> is the same two bytes over and over. Is this the result of
>> compression?
>
> Yep this is probably the result of compression if the values had a lot
> of redundancy. Note that Redis is using both string values
> compressions and integer encoding of keys and values can look like an
> integer and can be reconstructed bit-by-bit as the original string
> saving by their integer representation. So for example "2112938487"
> will only take four bytes in the DB dump, but instead "   2112938487"
> will be stored as a string since there is a space before the number.

I noticed it took Redis a while to start back up when I restarted the
process, so I hacked Redis to exit after it had finished loading in
the DB file mentioned above and it took ~11 minutes. Its not an issue
for me, but I thought I would pass it along for those who might be
interested in data loading times ...

# time ./redis-server redis.conf
- Server started, Redis version 0.100
- DB loaded from disk

real 11m0.194s
user 9m42.540s
sys 1m15.910s

--
Brenden C Grace

Salvatore Sanfilippo

unread,
May 4, 2009, 6:06:50 PM5/4/09
to redi...@googlegroups.com
On Mon, May 4, 2009 at 9:19 PM, Brenden Grace <brende...@gmail.com> wrote:

> I noticed it took Redis a while to start back up when I restarted the
> process, so I hacked Redis to exit after it had finished loading in
> the DB file mentioned above and it took ~11 minutes. Its not an issue
> for me, but I thought I would pass it along for those who might be
> interested in data loading times ...
>
> # time ./redis-server redis.conf
> - Server started, Redis version 0.100
> - DB loaded from disk
>
> real    11m0.194s
> user    9m42.540s
> sys     1m15.910s

Hello Brenden,

Loading times are not only proportional to the size of the DB but also
to the number of keys stored inside. If this is a ~300 million keys DB
then 11 minutes can be ok, but I think Redis is not optimizing the
loading time enough here, it can be faster, since what was happening
is that during the loading of the DB the hash table is resizing
multiple times.

If Redis instead will save the number of elements in new releases it
can resize the hash table just one time before to start loading the
keys. This may save enough time and can be implemented in a backward
compatible way just as an hint that a DB may contain or not.

Adding this to the post Redis-1.0 todo list for now.

Ciao,
Salvatore

>
> --
> Brenden C Grace
Reply all
Reply to author
Forward
0 new messages